ERIC - Search Results

Publication Date

In 2025	0
Since 2024	2
Since 2021 (last 5 years)	3
Since 2016 (last 10 years)	10
Since 2006 (last 20 years)	19

Descriptor

Error of Measurement	23
Item Response Theory	23
Sampling	23
Sample Size	9
Test Items	8
Simulation	7
Comparative Analysis	5
Computation	5
Difficulty Level	5
Scores	5
Data Analysis	4
Item Analysis	4
Monte Carlo Methods	4
Statistical Analysis	4
Statistical Inference	4
Ability	3
Equated Scores	3
Estimation (Mathematics)	3
Evaluation Methods	3
Foreign Countries	3
Measurement	3
Psychometrics	3
Test Bias	3
Test Validity	3
Accuracy	2
More ▼

Source

Educational and Psychological…	4
Applied Measurement in…	3
ETS Research Report Series	2
Grantee Submission	2
Journal of Educational and…	2
Applied Psychological…	1
Council of Chief State School…	1
International Journal of…	1
National Center for Education…	1
Online Submission	1
ProQuest LLC	1
Research Papers in Education	1
More ▼

Publication Type

Reports - Research	16
Journal Articles	15
Reports - Evaluative	4
Speeches/Meeting Papers	3
Reports - Descriptive	2
Dissertations/Theses -…	1
Guides - Non-Classroom	1
Numerical/Quantitative Data	1

Education Level

Elementary Secondary Education	2
Elementary Education	1
Grade 8	1
Junior High Schools	1
Kindergarten	1
Middle Schools	1
Secondary Education	1

Audience

Location

Canada	1
Germany	1
United Kingdom (England)	1

Laws, Policies, & Programs

Assessments and Surveys

National Assessment of…	2
Early Childhood Longitudinal…	1
Student Teacher Relationship…	1
Trends in International…	1

What Works Clearinghouse Rating

Showing 1 to 15 of 23 results Save | Export

A Note on Standard Errors for Multidimensional Two-Parameter Logistic Models Using Gaussian Variational Estimation

Peer reviewed

Direct link

Jiaying Xiao; Chun Wang; Gongjun Xu – Grantee Submission, 2024

Accurate item parameters and standard errors (SEs) are crucial for many multidimensional item response theory (MIRT) applications. A recent study proposed the Gaussian Variational Expectation Maximization (GVEM) algorithm to improve computational efficiency and estimation accuracy (Cho et al., 2021). However, the SE estimation procedure has yet to…

Descriptors: Error of Measurement, Models, Evaluation Methods, Item Analysis

Multi-Group Regularized Gaussian Variational Estimation: Fast Detection of DIF

Peer reviewed

Direct link

Weicong Lyu; Chun Wang; Gongjun Xu – Grantee Submission, 2024

Data harmonization is an emerging approach to strategically combining data from multiple independent studies, enabling addressing new research questions that are not answerable by a single contributing study. A fundamental psychometric challenge for data harmonization is to create commensurate measures for the constructs of interest across…

Descriptors: Data Analysis, Test Items, Psychometrics, Item Response Theory

A Fast and Simple Algorithm for Bayesian Adaptive Testing

Peer reviewed

Direct link

van der Linden, Wim J.; Ren, Hao – Journal of Educational and Behavioral Statistics, 2020

The Bayesian way of accounting for the effects of error in the ability and item parameters in adaptive testing is through the joint posterior distribution of all parameters. An optimized Markov chain Monte Carlo algorithm for adaptive testing is presented, which samples this distribution in real time to score the examinee's ability and optimally…

Descriptors: Bayesian Statistics, Adaptive Testing, Error of Measurement, Markov Processes

Robustness of Weighted Differential Item Functioning (DIF) Analysis: The Case of Mantel-Haenszel DIF Statistics. Research Report. ETS RR-21-12

Peer reviewed
PDF on ERIC

Download full text

Lu, Ru; Guo, Hongwen; Dorans, Neil J. – ETS Research Report Series, 2021

Two families of analysis methods can be used for differential item functioning (DIF) analysis. One family is DIF analysis based on observed scores, such as the Mantel-Haenszel (MH) and the standardized proportion-correct metric for DIF procedures; the other is analysis based on latent ability, in which the statistic is a measure of departure from…

Descriptors: Robustness (Statistics), Weighted Scores, Test Items, Item Analysis

Impact of Item Parameter Drift on Rasch Scale Stability in Small Samples over Multiple Administrations

Peer reviewed

Direct link

Kopp, Jason P.; Jones, Andrew T. – Applied Measurement in Education, 2020

Traditional psychometric guidelines suggest that at least several hundred respondents are needed to obtain accurate parameter estimates under the Rasch model. However, recent research indicates that Rasch equating results in accurate parameter estimates with sample sizes as small as 25. Item parameter drift under the Rasch model has been…

Descriptors: Item Response Theory, Psychometrics, Sample Size, Sampling

Error Variance in Common Population Linking Bridge Studies. Research Report. ETS RR-19-42

Peer reviewed
PDF on ERIC

Download full text

Jewsbury, Paul A. – ETS Research Report Series, 2019

When an assessment undergoes changes to the administration or instrument, bridge studies are typically used to try to ensure comparability of scores before and after the change. Among the most common and powerful is the common population linking design, with the use of a linear transformation to link scores to the metric of the original…

Descriptors: Evaluation Research, Scores, Error Patterns, Error of Measurement

Impact of Missing Data on Rasch Model Estimations

Download full text

Soysal, Sümeyra; Arikan, Çigdem Akin; Inal, Hatice – Online Submission, 2016

This study aims to investigate the effect of methods to deal with missing data on item difficulty estimations under different test length conditions and sampling sizes. In this line, a data set including 10, 20 and 40 items with 100 and 5000 sampling size was prepared. Deletion process was applied at the rates of 5%, 10% and 20% under conditions…

Descriptors: Research Problems, Data Analysis, Item Response Theory, Test Items

TIMSS 2015: Illustrating Advancements in Large-Scale International Assessments

Peer reviewed

Direct link

Martin, Michael O.; Mullis, Ina V. S. – Journal of Educational and Behavioral Statistics, 2019

International large-scale assessments of student achievement such as International Association for the Evaluation of Educational Achievement's Trends in International Mathematics and Science Study (TIMSS) and Progress in International Reading Literacy Study and Organization for Economic Cooperation and Development's Program for International…

Descriptors: Achievement Tests, International Assessment, Mathematics Tests, Science Achievement

It Might Not Make a Big DIF: Improved Differential Test Functioning Statistics That Account for Sampling Variability

Peer reviewed

Direct link

Chalmers, R. Philip; Counsell, Alyssa; Flora, David B. – Educational and Psychological Measurement, 2016

Differential test functioning, or DTF, occurs when one or more items in a test demonstrate differential item functioning (DIF) and the aggregate of these effects are witnessed at the test level. In many applications, DTF can be more important than DIF when the overall effects of DIF at the test level can be quantified. However, optimal statistical…

Descriptors: Test Bias, Sampling, Test Items, Statistical Analysis

Bootstrap Standard Errors for Maximum Likelihood Ability Estimates When Item Parameters Are Unknown

Peer reviewed

Direct link

Patton, Jeffrey M.; Cheng, Ying; Yuan, Ke-Hai; Diao, Qi – Educational and Psychological Measurement, 2014

When item parameter estimates are used to estimate the ability parameter in item response models, the standard error (SE) of the ability estimate must be corrected to reflect the error carried over from item calibration. For maximum likelihood (ML) ability estimates, a corrected asymptotic SE is available, but it requires a long test and the…

Descriptors: Sampling, Statistical Inference, Maximum Likelihood Statistics, Computation

Selection of Common Items as an Unrecognized Source of Variability in Test Equating: A Bootstrap Approximation Assuming Random Sampling of Common Items

Peer reviewed

Direct link

Michaelides, Michalis P.; Haertel, Edward H. – Applied Measurement in Education, 2014

The standard error of equating quantifies the variability in the estimation of an equating function. Because common items for deriving equated scores are treated as fixed, the only source of variability typically considered arises from the estimation of common-item parameters from responses of samples of examinees. Use of alternative, equally…

Descriptors: Equated Scores, Test Items, Sampling, Statistical Inference

An Investigation of Measurement Invariance of the Key Stage 2 National Curriculum Science Sampling Test in England

Peer reviewed

Direct link

He, Qingping; Anwyll, Steve; Glanville, Matthew; Opposs, Dennis – Research Papers in Education, 2014

Since 2010, the whole national cohort Key Stage 2 (KS2) National Curriculum test in science in England has been replaced with a sampling test taken by pupils at the age of 11 from a nationally representative sample of schools annually. The study reported in this paper compares the performance of different subgroups of the samples (classified by…

Descriptors: National Curriculum, Sampling, Foreign Countries, Factor Analysis

Impact of Design Effects in Large-Scale District and State Assessments

Peer reviewed

Direct link

Phillips, Gary W. – Applied Measurement in Education, 2015

This article proposes that sampling design effects have potentially huge unrecognized impacts on the results reported by large-scale district and state assessments in the United States. When design effects are unrecognized and unaccounted for they lead to underestimating the sampling error in item and test statistics. Underestimating the sampling…

Descriptors: State Programs, Sampling, Research Design, Error of Measurement

Observed-Score Equating with a Heterogeneous Target Population

Peer reviewed

Direct link

Duong, Minh Q.; von Davier, Alina A. – International Journal of Testing, 2012

Test equating is a statistical procedure for adjusting for test form differences in difficulty in a standardized assessment. Equating results are supposed to hold for a specified target population (Kolen & Brennan, 2004; von Davier, Holland, & Thayer, 2004) and to be (relatively) independent of the subpopulations from the target population (see…

Descriptors: Ability Grouping, Difficulty Level, Psychometrics, Statistical Analysis

Assessing Goodness of Fit in Item Response Theory with Nonparametric Models: A Comparison of Posterior Probabilities and Kernel-Smoothing Approaches

Peer reviewed

Direct link

Sueiro, Manuel J.; Abad, Francisco J. – Educational and Psychological Measurement, 2011

The distance between nonparametric and parametric item characteristic curves has been proposed as an index of goodness of fit in item response theory in the form of a root integrated squared error index. This article proposes to use the posterior distribution of the latent trait as the nonparametric model and compares the performance of an index…

Descriptors: Goodness of Fit, Item Response Theory, Nonparametric Statistics, Probability

Previous Page | Next Page »

Pages: 1 | 2

Chun Wang	2
Gongjun Xu	2
Abad, Francisco J.	1
Anwyll, Steve	1
Arikan, Çigdem Akin	1
Berger, Martjin P. F.	1
Bergstrom, Betty A.	1
Blais, Jean-Guy	1
Blaker, Lisa	1
Chalmers, R. Philip	1
Cheng, Ying	1
Choi, Sae Il	1
Counsell, Alyssa	1
Diao, Qi	1
Doorey, Nancy A.	1
Dorans, Neil J.	1
Duong, Minh Q.	1
Flora, David B.	1
Glanville, Matthew	1
Guo, Hongwen	1
Haertel, Edward H.	1
Hambleton, Ronald K.	1
He, Qingping	1
Inal, Hatice	1
More ▼