ERIC - Search Results

Publication Date

In 2026	0
Since 2025	2
Since 2022 (last 5 years)	4
Since 2017 (last 10 years)	19
Since 2007 (last 20 years)	42

Descriptor

Accuracy	44
Statistical Analysis	44
Test Items	44
Item Response Theory	21
Models	11
Classification	10
Comparative Analysis	10
Computation	10
Difficulty Level	9
Sample Size	9
Equated Scores	8
Foreign Countries	8
Scores	7
Computer Assisted Testing	6
Correlation	6
Language Tests	6
Multiple Choice Tests	6
Simulation	6
Test Bias	6
Goodness of Fit	5
Item Analysis	5
Prediction	5
Computer Software	4
English (Second Language)	4
High Stakes Tests	4
More ▼

Publication Type

Journal Articles	39
Reports - Research	39
Dissertations/Theses -…	4
Reports - Evaluative	1
Tests/Questionnaires	1

Education Level

Higher Education	9
Postsecondary Education	7
Elementary Education	2
Secondary Education	1

Audience

Location

California	1
Germany	1
Japan	1
Japan (Tokyo)	1
Kyrgyzstan	1
Mississippi	1
Philippines	1
Tennessee	1
Turkey	1
United Kingdom (Wales)	1

Laws, Policies, & Programs

Assessments and Surveys

Graduate Record Examinations	2
Test of English as a Foreign…	2
Program for International…	1
SAT (College Admission Test)	1

What Works Clearinghouse Rating

Showing 1 to 15 of 44 results Save | Export

A Comparison of Anchor Selection Strategies for DIF Analysis

Peer reviewed

Direct link

Haeju Lee; Kyung Yong Kim – Journal of Educational Measurement, 2025

When no prior information of differential item functioning (DIF) exists for items in a test, either the rank-based or iterative purification procedure might be preferred. The rank-based purification selects anchor items based on a preliminary DIF test. For a preliminary DIF test, likelihood ratio test (LRT) based approaches (e.g.,…

Descriptors: Test Items, Equated Scores, Test Bias, Accuracy

Simultaneous Linear Equating for Scenarios with Optional Test Versions or across Multiple Alternative Anchors

Peer reviewed
PDF on ERIC

Download full text

Tom Benton – Practical Assessment, Research & Evaluation, 2025

This paper proposes an extension of linear equating that may be useful in one of two fairly common assessment scenarios. One is where different students have taken different combinations of test forms. This might occur, for example, where students have some free choice over the exam papers they take within a particular qualification. In this…

Descriptors: Equated Scores, Test Format, Test Items, Computation

Evaluating the Effects of Missing Data Handling Methods on Scale Linking Accuracy

Peer reviewed

Direct link

Wu, Tong; Kim, Stella Y.; Westine, Carl – Educational and Psychological Measurement, 2023

For large-scale assessments, data are often collected with missing responses. Despite the wide use of item response theory (IRT) in many testing programs, however, the existing literature offers little insight into the effectiveness of various approaches to handling missing responses in the context of scale linking. Scale linking is commonly used…

Descriptors: Data Analysis, Responses, Statistical Analysis, Measurement

Comparing Drift Detection Methods for Accurate Rasch Equating in Different Sample Sizes

Peer reviewed

Direct link

Alahmadi, Sarah; Jones, Andrew T.; Barry, Carol L.; Ibáñez, Beatriz – Applied Measurement in Education, 2023

Rasch common-item equating is often used in high-stakes testing to maintain equivalent passing standards across test administrations. If unaddressed, item parameter drift poses a major threat to the accuracy of Rasch common-item equating. We compared the performance of well-established and newly developed drift detection methods in small and large…

Descriptors: Equated Scores, Item Response Theory, Sample Size, Test Items

How Useful Is Comparative Judgement of Item Difficulty for Standard Maintaining?

Download full text

Benton, Tom – Research Matters, 2020

This article reviews the evidence on the extent to which experts' perceptions of item difficulties, captured using comparative judgement, can predict empirical item difficulties. This evidence is drawn from existing published studies on this topic and also from statistical analysis of data held by Cambridge Assessment. Having reviewed the…

Descriptors: Test Items, Difficulty Level, Expertise, Comparative Analysis

Reconsidering Cutoff Points in the General Method of Empirical Q-Matrix Validation

Peer reviewed

Direct link

Nájera, Pablo; Sorrel, Miguel A.; Abad, Francisco José – Educational and Psychological Measurement, 2019

Cognitive diagnosis models (CDMs) are latent class multidimensional statistical models that help classify people accurately by using a set of discrete latent variables, commonly referred to as attributes. These models require a Q-matrix that indicates the attributes involved in each item. A potential problem is that the Q-matrix construction…

Descriptors: Matrices, Statistical Analysis, Models, Classification

Does Comparative Judgement of Scripts Provide an Effective Means of Maintaining Standards in Mathematics? Research Report

Download full text

Benton, Tom; Leech, Tony; Hughes, Sarah – Cambridge Assessment, 2020

In the context of examinations, the phrase "maintaining standards" usually refers to any activity designed to ensure that it is no easier (or harder) to achieve a given grade in one year than in another. Specifically, it tends to mean activities associated with setting examination grade boundaries. Benton et al (2020) describes a method…

Descriptors: Mathematics Tests, Equated Scores, Comparative Analysis, Difficulty Level

Sensitive Survey Questions with Auxiliary Information

Peer reviewed

Direct link

Chou, Winston; Imai, Kosuke; Rosenfeld, Bryn – Sociological Methods & Research, 2020

Scholars increasingly rely on indirect questioning techniques to reduce social desirability bias and item nonresponse for sensitive survey questions. The major drawback of these approaches, however, is their inefficiency relative to direct questioning. We show how to improve the statistical analysis of the list experiment, randomized response…

Descriptors: Surveys, Test Items, Questioning Techniques, Statistical Analysis

The Impact of Different Missing Data Handling Methods on DINA Model

Peer reviewed
PDF on ERIC

Download full text

Sünbül, Seçil Ömür – International Journal of Evaluation and Research in Education, 2018

In this study, it was aimed to investigate the impact of different missing data handling methods on DINA model parameter estimation and classification accuracy. In the study, simulated data were used and the data were generated by manipulating the number of items and sample size. In the generated data, two different missing data mechanisms…

Descriptors: Data, Test Items, Sample Size, Statistical Analysis

Evaluating the Accuracy of the Empirical Item Characteristic Curve Preequating Method in the Presence of Test Speededness

Peer reviewed

Direct link

Qiu, Yuxi; Huggins-Manley, Anne Corinne – Educational and Psychological Measurement, 2019

This study aimed to assess the accuracy of the empirical item characteristic curve (EICC) preequating method given the presence of test speededness. The simulation design of this study considered the proportion of speededness, speededness point, speededness rate, proportion of missing on speeded items, sample size, and test length. After crossing…

Descriptors: Accuracy, Equated Scores, Test Items, Nonparametric Statistics

Determining Item Screening Criteria Using Cost-Benefit Analysis

Peer reviewed
PDF on ERIC

Download full text

Bashkov, Bozhidar M.; Clauser, Jerome C. – Practical Assessment, Research & Evaluation, 2019

Successful testing programs rely on high-quality test items to produce reliable scores and defensible exams. However, determining what statistical screening criteria are most appropriate to support these goals can be daunting. This study describes and demonstrates cost-benefit analysis as an empirical approach to determining appropriate screening…

Descriptors: Test Items, Test Reliability, Evaluation Criteria, Accuracy

IRT Item Parameter Scaling for Developing New Item Pools

Peer reviewed

Direct link

Kang, Hyeon-Ah; Lu, Ying; Chang, Hua-Hua – Applied Measurement in Education, 2017

Increasing use of item pools in large-scale educational assessments calls for an appropriate scaling procedure to achieve a common metric among field-tested items. The present study examines scaling procedures for developing a new item pool under a spiraled block linking design. The three scaling procedures are considered: (a) concurrent…

Descriptors: Item Response Theory, Accuracy, Educational Assessment, Test Items

Evaluating Performance of Missing Data Imputation Methods in IRT Analyses

Peer reviewed
PDF on ERIC

Download full text

Kalkan, Ömür Kaya; Kara, Yusuf; Kelecioglu, Hülya – International Journal of Assessment Tools in Education, 2018

Missing data is a common problem in datasets that are obtained by administration of educational and psychological tests. It is widely known that existence of missing observations in data can lead to serious problems such as biased parameter estimates and inflation of standard errors. Most of the missing data imputation methods are focused on…

Descriptors: Item Response Theory, Statistical Analysis, Data, Test Items

The Consequences of Ignoring Item Parameter Drift in Longitudinal Item Response Models

Peer reviewed

Direct link

Lee, Wooyeol; Cho, Sun-Joo – Applied Measurement in Education, 2017

Utilizing a longitudinal item response model, this study investigated the effect of item parameter drift (IPD) on item parameters and person scores via a Monte Carlo study. Item parameter recovery was investigated for various IPD patterns in terms of bias and root mean-square error (RMSE), and percentage of time the 95% confidence interval covered…

Descriptors: Item Response Theory, Test Items, Bias, Computation

The Matching Criterion Purification for Differential Item Functioning Analyses in a Large-Scale Assessment

Peer reviewed

Direct link

Lee, HyeSun; Geisinger, Kurt F. – Educational and Psychological Measurement, 2016

The current study investigated the impact of matching criterion purification on the accuracy of differential item functioning (DIF) detection in large-scale assessments. The three matching approaches for DIF analyses (block-level matching, pooled booklet matching, and equated pooled booklet matching) were employed with the Mantel-Haenszel…

Descriptors: Test Bias, Measurement, Accuracy, Statistical Analysis

Previous Page | Next Page »

Pages: 1 | 2 | 3

Educational and Psychological…	6
ETS Research Report Series	5
ProQuest LLC	4
Applied Measurement in…	3
Applied Psychological…	3
Journal of Educational and…	3
Practical Assessment,…	3
Cambridge Assessment	1
InSight: A Journal of…	1
International Journal of…	1
International Journal of…	1
International Journal of…	1
International Journal of…	1
International Journal of…	1
JALT CALL Journal	1
Journal of Educational…	1
Journal of Educational…	1
Journal of Experimental…	1
Journal on English Language…	1
Language Assessment Quarterly	1
Psychometrika	1
Research & Practice in…	1
Research Matters	1
Sociological Methods &…	1
More ▼

Benton, Tom	2
Chang, Hua-Hua	2
Abad, Francisco José	1
Abayeva, Nella F.	1
Alahmadi, Sarah	1
Albano, Anthony D.	1
Ashwell, Tim	1
Attali, Yigal	1
Babcock, Ben	1
Barry, Carol L.	1
Bashkov, Bozhidar M.	1
Bernardo, Alejandro S.	1
Chen, Jinsong	1
Cho, Sun-Joo	1
Chou, Winston	1
Clauser, Jerome C.	1
Cross, L. Brian	1
DeMars, Christine E.	1
Deng, Nina	1
Douglas, Jeffrey A.	1
Drummond, Todd W.	1
Elam, Jesse R.	1
Fan, Zhewen	1
Geisinger, Kurt F.	1
Golovachyova, Viktoriya N.	1
More ▼