ERIC - Search Results

Publication Date

In 2025	1
Since 2024	1
Since 2021 (last 5 years)	4
Since 2016 (last 10 years)	8
Since 2006 (last 20 years)	11

Descriptor

Item Response Theory	19
Scores	19
Statistical Distributions	19
Foreign Countries	5
Computation	4
Mathematical Models	4
Probability	4
Test Construction	4
Accuracy	3
Bayesian Statistics	3
Equations (Mathematics)	3
Error of Measurement	3
Estimation (Mathematics)	3
Mathematics Tests	3
Maximum Likelihood Statistics	3
Models	3
Scoring	3
Simulation	3
Test Items	3
Ability	2
Achievement Tests	2
Adaptive Testing	2
Classification	2
Goodness of Fit	2
Grades (Scholastic)	2
More ▼

Source

Educational and Psychological…	3
ETS Research Report Series	2
Psychometrika	2
Annenberg Institute for…	1
Applied Psychological…	1
Grantee Submission	1
International Educational…	1
Journal of Educational…	1
National Center for Education…	1
ProQuest LLC	1
Research in Mathematics…	1
Teaching Statistics: An…	1
More ▼

Publication Type

Journal Articles	11
Reports - Research	8
Reports - Evaluative	7
Reports - Descriptive	3
Dissertations/Theses -…	1
Numerical/Quantitative Data	1
Speeches/Meeting Papers	1

Education Level

Secondary Education	2
Early Childhood Education	1
Elementary Education	1
Grade 8	1
Higher Education	1
Kindergarten	1
Postsecondary Education	1
Primary Education	1

Audience

Location

China	1
United Kingdom (England)	1
United States	1

Laws, Policies, & Programs

Assessments and Surveys

ACT Assessment	1
Law School Admission Test	1
Program for International…	1

What Works Clearinghouse Rating

Showing 1 to 15 of 19 results Save | Export

The Sensitivity of Value-Added Estimates to Test Scoring Decisions. EdWorkingPaper No. 25-1226

Download full text

Joshua B. Gilbert; James G. Soland; Benjamin W. Domingue – Annenberg Institute for School Reform at Brown University, 2025

Value-Added Models (VAMs) are both common and controversial in education policy and accountability research. While the sensitivity of VAMs to model specification and covariate selection is well documented, the extent to which test scoring methods (e.g., mean scores vs. IRT-based scores) may affect VA estimates is less studied. We examine the…

Descriptors: Value Added Models, Tests, Testing, Scoring

Investigating Constructed-Response Scoring over Time: The Effects of Study Design on Trend Rescore Statistics. Research Report. ETS RR-22-15

Peer reviewed
PDF on ERIC

Download full text

Donoghue, John R.; McClellan, Catherine A.; Hess, Melinda R. – ETS Research Report Series, 2022

When constructed-response items are administered for a second time, it is necessary to evaluate whether the current Time B administration's raters have drifted from the scoring of the original administration at Time A. To study this, Time A papers are sampled and rescored by Time B scorers. Commonly the scores are compared using the proportion of…

Descriptors: Item Response Theory, Test Construction, Scoring, Testing

Poisson Diagnostic Classification Models: A Framework and an Exploratory Example

Peer reviewed

Direct link

Liu, Ren; Liu, Haiyan; Shi, Dexin; Jiang, Zhehan – Educational and Psychological Measurement, 2022

Assessments with a large amount of small, similar, or often repetitive tasks are being used in educational, neurocognitive, and psychological contexts. For example, respondents are asked to recognize numbers or letters from a large pool of those and the number of correct answers is a count variable. In 1960, George Rasch developed the Rasch…

Descriptors: Classification, Models, Statistical Distributions, Scores

Grades Are Not Normal: Improving Exam Score Models Using the Logit-Normal Distribution

Peer reviewed
PDF on ERIC

Download full text

Arthurs, Noah; Stenhaug, Ben; Karayev, Sergey; Piech, Chris – International Educational Data Mining Society, 2019

Understanding exam score distributions has implications for item response theory (IRT), grade curving, and downstream modeling tasks such as peer grading. Historically, grades have been assumed to be normally distributed, and to this day the normal is the ubiquitous choice for modeling exam scores. While this is a good assumption for tests…

Descriptors: Grades (Scholastic), Scores, Statistical Distributions, Models

Summed Score Likelihood Based Indices for Testing Latent Variable Distribution Fit in Item Response Theory

Peer reviewed
PDF on ERIC

Download full text

Li, Zhen; Cai, Li – Grantee Submission, 2017

In standard item response theory (IRT) applications, the latent variable is typically assumed to be normally distributed. If the normality assumption is violated, the item parameter estimates can become biased. Summed score likelihood based statistics may be useful for testing latent variable distribution fit. We develop Satorra-Bentler type…

Descriptors: Scores, Goodness of Fit, Statistical Distributions, Item Response Theory

Technical Report and User Guide for the 2016 Program for International Student Assessment (PISA) Young Adult Follow-Up Study. NCES 2021-020

Peer reviewed
PDF on ERIC

Download full text

Kastberg, David; Murray, Gordon; Ferraro, David; Arieira, Carlos; Roey, Shep; Mamedova, Saida; Liao, Yuqi – National Center for Education Statistics, 2021

The Program for International Student Assessment Young Adult Follow-up Study (PISA YAFS) is a follow-up study with students who participated in PISA 2012 in the United States. The study is designed to measure how performance on PISA 2012 relates to subsequent measures of outcomes and skills of young adults on an online assessment, Education and…

Descriptors: Foreign Countries, Achievement Tests, Secondary School Students, Young Adults

Descriptive Statistics for Modern Test Score Distributions: Skewness, Kurtosis, Discreteness, and Ceiling Effects

Peer reviewed

Direct link

Ho, Andrew D.; Yu, Carol C. – Educational and Psychological Measurement, 2015

Many statistical analyses benefit from the assumption that unconditional or conditional distributions are continuous and normal. More than 50 years ago in this journal, Lord and Cook chronicled departures from normality in educational tests, and Micerri similarly showed that the normality assumption is met rarely in educational and psychological…

Descriptors: Statistics, Scores, Statistical Distributions, Tests

Generalization of the Lord-Wingersky Algorithm to Computing the Distribution of Summed Test Scores Based on Real-Number Item Scores

Peer reviewed

Direct link

Kim, Seonghoon – Journal of Educational Measurement, 2013

With known item response theory (IRT) item parameters, Lord and Wingersky provided a recursive algorithm for computing the conditional frequency distribution of number-correct test scores, given proficiency. This article presents a generalized algorithm for computing the conditional distribution of summed test scores involving real-number item…

Descriptors: Item Response Theory, Scores, Computation, Mathematics

Effectiveness of Item Response Theory (IRT) Proficiency Estimation Methods under Adaptive Multistage Testing. Research Report. ETS RR-15-11

Peer reviewed
PDF on ERIC

Download full text

Kim, Sooyeon; Moses, Tim; Yoo, Hanwook Henry – ETS Research Report Series, 2015

The purpose of this inquiry was to investigate the effectiveness of item response theory (IRT) proficiency estimators in terms of estimation bias and error under multistage testing (MST). We chose a 2-stage MST design in which 1 adaptation to the examinees' ability levels takes place. It includes 4 modules (1 at Stage 1, 3 at Stage 2) and 3 paths…

Descriptors: Item Response Theory, Computation, Statistical Bias, Error of Measurement

Some Implications of Choice of Tiering Model in GCSE Mathematics for Inferences about What Students Know and Can Do

Peer reviewed

Direct link

Bramley, Tom – Research in Mathematics Education, 2017

This study compared models of assessment structure for achieving differentiation across the range of examinee attainment in the General Certificate of Secondary Education (GCSE) examination taken by 16-year-olds in England. The focus was on the "adjacent levels" model, where papers are targeted at three specific non-overlapping ranges of…

Descriptors: Foreign Countries, Mathematics Education, Student Certification, Student Evaluation

Differential Item Functioning for Accommodated Students with Disabilities: Effect of Differences in Proficiency Distributions

Direct link

Quesen, Sarah – ProQuest LLC, 2016

When studying differential item functioning (DIF) with students with disabilities (SWD) focal groups typically suffer from small sample size, whereas the reference group population is usually large. This makes it possible for a researcher to select a sample from the reference population to be similar to the focal group on the ability scale. Doing…

Descriptors: Test Items, Academic Accommodations (Disabilities), Testing Accommodations, Disabilities

A Lower Bound for the Most Deviant Z Score

Peer reviewed

Direct link

Hayes, Kevin – Teaching Statistics: An International Journal for Teachers, 2004

This article demonstrates that the lower bound for the most deviant Z score and the upper bound for the sample standard deviation are attained simultaneously.

Descriptors: Statistical Analysis, Scores, Item Response Theory, Probability

Checking the Appropriateness of Item Response Theory Models by Predicting the Distribution of Observed Scores: The Program EO-Fit.

Peer reviewed

Ferrando, Pere J.; Lorenzo-Seva, Urbano – Educational and Psychological Measurement, 2001

Describes a Windows program for checking the suitability of unidimensional logistic item response models for binary and ordered polytomous responses with respect to a given set of data. The program is based on predicting the observed test score distributions from the item characteristic curves. (SLD)

Descriptors: Computer Software, Item Response Theory, Mathematical Models, Prediction

Observed-Score Equating as a Test Assembly Problem.

Peer reviewed

van der Linden, Wim J.; Luecht, Richard M. – Psychometrika, 1998

Derives a set of linear conditions of item-response functions that guarantees identical observed-score distributions on two test forms. The conditions can be added as constraints to a linear programming model for test assembly. An example illustrates the use of the model for an item pool from the Law School Admissions Test (LSAT). (SLD)

Descriptors: Equated Scores, Item Banks, Item Response Theory, Linear Programming

Simulating the Null Distribution of Person-Fit Statistics for Conventional and Adaptive Tests. Research Report 98-02.

Download full text

Meijer, Rob R.; van Krimpen-Stoop, Edith M. L. A. – 1998

Several person-fit statistics have been proposed to detect item score patterns that do not fit an item response theory model. To classify response patterns as not fitting a model, a distribution of a person-fit statistic is needed. The null distributions of several fit statistics have been investigated using conventionally administered tests, but…

Descriptors: Ability, Adaptive Testing, Foreign Countries, Item Response Theory

Previous Page | Next Page »

Pages: 1 | 2

Luecht, Richard M.	2
van der Linden, Wim J.	2
Arieira, Carlos	1
Arthurs, Noah	1
Baker, Frank B.	1
Benjamin W. Domingue	1
Bramley, Tom	1
Cai, Li	1
Donoghue, John R.	1
Ferrando, Pere J.	1
Ferraro, David	1
Harwell, Michael R.	1
Hayes, Kevin	1
Hess, Melinda R.	1
Ho, Andrew D.	1
James G. Soland	1
Jansen, Margo G. H.	1
Jiang, Zhehan	1
Joshua B. Gilbert	1
Junker, Brian W.	1
Karayev, Sergey	1
Kastberg, David	1
Kim, Seonghoon	1
Kim, Sooyeon	1
Li, Zhen	1
More ▼