ERIC - Search Results

Publication Date

In 2025	0
Since 2024	0
Since 2021 (last 5 years)	4
Since 2016 (last 10 years)	25
Since 2006 (last 20 years)	32

Descriptor

Statistical Analysis	41
Test Items	23
Item Response Theory	19
Comparative Analysis	12
Models	8
Mathematics Tests	7
Scores	7
Correlation	6
Foreign Countries	6
Reading Tests	6
Sample Size	6
Test Bias	6
Achievement Tests	5
College Entrance Examinations	5
Computation	5
Difficulty Level	5
Equated Scores	5
Test Construction	5
Accuracy	4
Computer Assisted Testing	4
Elementary School Students	4
Gender Differences	4
Grade 8	4
High School Students	4
High Stakes Tests	4
More ▼

Source

Applied Measurement in…

Publication Type

Journal Articles	41
Reports - Research	36
Reports - Evaluative	5
Tests/Questionnaires	1

Education Level

Elementary Education	5
High Schools	5
Secondary Education	5
Grade 8	4
Higher Education	3
Junior High Schools	3
Middle Schools	3
Elementary Secondary Education	2
Grade 10	2
Postsecondary Education	2
Early Childhood Education	1
Grade 1	1
Grade 12	1
Grade 2	1
Grade 3	1
Grade 5	1
Grade 7	1
Intermediate Grades	1
Primary Education	1
More ▼

Audience

Researchers

Location

Virginia	2
Australia	1
California	1
Germany	1
Indiana	1
Iran	1
Israel	1
Kansas	1
Massachusetts	1
Michigan	1
Minnesota	1
Norway	1
Ohio	1
Oregon	1
Slovenia	1
Spain	1
Sweden	1
Tennessee	1
United States	1
Vermont	1
More ▼

Laws, Policies, & Programs

Assessments and Surveys

Graduate Record Examinations	3
National Assessment of…	3
Trends in International…	3
Program for International…	2
California Achievement Tests	1
College Board Achievement…	1
Early Childhood Longitudinal…	1
SAT (College Admission Test)	1
Stanford Achievement Tests	1

What Works Clearinghouse Rating

Showing 1 to 15 of 41 results Save | Export

Comparing Drift Detection Methods for Accurate Rasch Equating in Different Sample Sizes

Peer reviewed

Direct link

Alahmadi, Sarah; Jones, Andrew T.; Barry, Carol L.; Ibáñez, Beatriz – Applied Measurement in Education, 2023

Rasch common-item equating is often used in high-stakes testing to maintain equivalent passing standards across test administrations. If unaddressed, item parameter drift poses a major threat to the accuracy of Rasch common-item equating. We compared the performance of well-established and newly developed drift detection methods in small and large…

Descriptors: Equated Scores, Item Response Theory, Sample Size, Test Items

Detecting Differential Item Functioning Using Cognitive Diagnosis Models: Applications of the Wald Test and Likelihood Ratio Test in a University Entrance Examination

Peer reviewed

Direct link

Mehrazmay, Roghayeh; Ghonsooly, Behzad; de la Torre, Jimmy – Applied Measurement in Education, 2021

The present study aims to examine gender differential item functioning (DIF) in the reading comprehension section of a high stakes test using cognitive diagnosis models. Based on the multiple-group generalized deterministic, noisy "and" gate (MG G-DINA) model, the Wald test and likelihood ratio test are used to detect DIF. The flagged…

Descriptors: Test Bias, College Entrance Examinations, Gender Differences, Reading Tests

Characterizing the Latent Classes in a Mixture IRT Model Using DIF

Peer reviewed

Direct link

Karadavut, Tugba – Applied Measurement in Education, 2021

Mixture IRT models address the heterogeneity in a population by extracting latent classes and allowing item parameters to vary between latent classes. Once the latent classes are extracted, they need to be further examined to be characterized. Some approaches have been adopted in the literature for this purpose. These approaches examine either the…

Descriptors: Item Response Theory, Models, Test Items, Maximum Likelihood Statistics

Applying a Multiple Comparison Control to IRT Item-Fit Testing

Peer reviewed

Direct link

Sauder, Derek; DeMars, Christine – Applied Measurement in Education, 2020

We used simulation techniques to assess the item-level and familywise Type I error control and power of an IRT item-fit statistic, the "S-X"[superscript 2]. Previous research indicated that the "S-X"[superscript 2] has good Type I error control and decent power, but no previous research examined familywise Type I error control.…

Descriptors: Item Response Theory, Test Items, Sample Size, Test Length

Subscore Equating and Profile Reporting

Peer reviewed

Direct link

Lim, Euijin; Lee, Won-Chan – Applied Measurement in Education, 2020

The purpose of this study is to address the necessity of subscore equating and to evaluate the performance of various equating methods for subtests. Assuming the random groups design and number-correct scoring, this paper analyzed real data and simulated data with four study factors including test dimensionality, subtest length, form difference in…

Descriptors: Equated Scores, Test Length, Test Format, Difficulty Level

Application of IRT Fixed Parameter Calibration to Multiple-Group Test Data

Peer reviewed

Direct link

Kim, Seonghoon; Kolen, Michael J. – Applied Measurement in Education, 2019

In applications of item response theory (IRT), fixed parameter calibration (FPC) has been used to estimate the item parameters of a new test form on the existing ability scale of an item pool. The present paper presents an application of FPC to multiple examinee groups test data that are linked to the item pool via anchor items, and investigates…

Descriptors: Item Response Theory, Item Banks, Test Items, Computation

Differential Item Functioning for Accommodated Students with Disabilities: Effect of Differences in Proficiency Distributions

Peer reviewed

Direct link

Quesen, Sarah; Lane, Suzanne – Applied Measurement in Education, 2019

This study examined the effect of similar vs. dissimilar proficiency distributions on uniform DIF detection on a statewide eighth grade mathematics assessment. Results from the similar- and dissimilar-ability reference groups with an SWD focal group were compared for four models: logistic regression, hierarchical generalized linear model (HGLM),…

Descriptors: Test Items, Mathematics Tests, Grade 8, Item Response Theory

Appraising the Scoring Performance of Automated Essay Scoring Systems--Some Additional Considerations: Which Essays? Which Human Raters? Which Scores?

Peer reviewed

Direct link

Raczynski, Kevin; Cohen, Allan – Applied Measurement in Education, 2018

The literature on Automated Essay Scoring (AES) systems has provided useful validation frameworks for any assessment that includes AES scoring. Furthermore, evidence for the scoring fidelity of AES systems is accumulating. Yet questions remain when appraising the scoring performance of AES systems. These questions include: (a) which essays are…

Descriptors: Essay Tests, Test Scoring Machines, Test Validity, Evaluators

Validating Human and Automated Scoring of Essays against "True" Scores

Peer reviewed

Direct link

Cohen, Yoav; Levi, Effi; Ben-Simon, Anat – Applied Measurement in Education, 2018

In the current study, two pools of 250 essays, all written as a response to the same prompt, were rated by two groups of raters (14 or 15 raters per group), thereby providing an approximation to the essay's true score. An automated essay scoring (AES) system was trained on the datasets and then scored the essays using a cross-validation scheme. By…

Descriptors: Test Validity, Automation, Scoring, Computer Assisted Testing

Statistically Comparing the Performance of Multiple Automated Raters across Multiple Items

Peer reviewed

Direct link

Kieftenbeld, Vincent; Boyer, Michelle – Applied Measurement in Education, 2017

Automated scoring systems are typically evaluated by comparing the performance of a single automated rater item-by-item to human raters. This presents a challenge when the performance of multiple raters needs to be compared across multiple items. Rankings could depend on specifics of the ranking procedure; observed differences could be due to…

Descriptors: Automation, Scoring, Comparative Analysis, Test Items

Detecting Local Dependence: A Threshold-Autoregressive Item Response Theory (TAR-IRT) Approach for Polytomous Items

Peer reviewed

Direct link

Tang, Xiaodan; Karabatsos, George; Chen, Haiqin – Applied Measurement in Education, 2020

In applications of item response theory (IRT) models, it is known that empirical violations of the local independence (LI) assumption can significantly bias parameter estimates. To address this issue, we propose a threshold-autoregressive item response theory (TAR-IRT) model that additionally accounts for order dependence among the item responses…

Descriptors: Item Response Theory, Test Items, Models, Computation

Response Time Differences between Computers and Tablets

Peer reviewed

Direct link

Kong, Xiaojing; Davis, Laurie Laughlin; McBride, Yuanyuan; Morrison, Kristin – Applied Measurement in Education, 2018

Item response time data were used in investigating the differences in student test-taking behavior between two device conditions: computer and tablet. Analyses were conducted to address the questions of whether or not the device condition had a differential impact on rapid guessing and solution behaviors (with response time effort used as an…

Descriptors: Educational Technology, Technology Uses in Education, Computers, Handheld Devices

Bayesian Estimation and Testing of a Linear Logistic Test Model for Learning during the Test

Peer reviewed

Direct link

Lozano, José H.; Revuelta, Javier – Applied Measurement in Education, 2021

The present study proposes a Bayesian approach for estimating and testing the operation-specific learning model, a variant of the linear logistic test model that allows for the measurement of the learning that occurs during a test as a result of the repeated use of the operations involved in the items. The advantages of using a Bayesian framework…

Descriptors: Bayesian Statistics, Computation, Learning, Testing

Focusing on Interactions between Content and Cognition: A New Perspective on Gender Differences in Mathematical Sub-Competencies

Peer reviewed

Direct link

George, Ann Cathrice; Robitzsch, Alexander – Applied Measurement in Education, 2018

This article presents a new perspective on measuring gender differences in the large-scale assessment study Trends in International Science Study (TIMSS). The suggested empirical model is directly based on the theoretical competence model of the domain mathematics and thus includes the interaction between content and cognitive sub-competencies.…

Descriptors: Achievement Tests, Elementary Secondary Education, Mathematics Achievement, Mathematics Tests

Sensitivity of Cross-State Assessment Item Difficulty to Differences in State Curricular Content Standards

Peer reviewed

Direct link

Traynor, Anne – Applied Measurement in Education, 2017

It has long been argued that U.S. states' differential performance on nationwide assessments may reflect differences in students' opportunity to learn the tested content that is primarily due to variation in curricular content standards, rather than in instructional quality or educational investment. To quantify the effect of differences in…

Descriptors: Test Items, Difficulty Level, State Standards, Academic Standards

Previous Page | Next Page »

Pages: 1 | 2 | 3

Davis, Laurie Laughlin	2
Kong, Xiaojing	2
McBride, Yuanyuan	2
Oliveri, Maria Elena	2
Ainley, John	1
Alahmadi, Sarah	1
Attali, Yigal	1
Baldonado, Angela Argo	1
Barry, Carol L.	1
Ben-Simon, Anat	1
Benítez, Isabel	1
Bovaird, James A.	1
Boyer, Michelle	1
Bridgeman, Brent	1
Briggs, Derek C.	1
Buzick, Heather	1
Chang, Hua-Hua	1
Chen, Haiqin	1
Cho, Sun-Joo	1
Cohen, Allan	1
Cohen, Yoav	1
Davey, Beth	1
DeMars, Christine	1
Dogan, Enis	1
More ▼