ERIC - Search Results

Publication Date

In 2025	0
Since 2024	1
Since 2021 (last 5 years)	4
Since 2016 (last 10 years)	25
Since 2006 (last 20 years)	58

Descriptor

Comparative Analysis	83
Test Items	31
Item Response Theory	26
Mathematics Tests	16
Computer Assisted Testing	14
Scores	14
Test Bias	14
Statistical Analysis	12
Foreign Countries	11
Simulation	10
Achievement Tests	9
Computation	9
Test Format	9
Correlation	8
Difficulty Level	8
Monte Carlo Methods	8
Scoring	8
Test Construction	8
Error of Measurement	7
Evaluation Methods	7
High School Students	7
Item Analysis	7
Measurement	7
Reading Tests	7
Accuracy	6
More ▼

Source

Applied Measurement in…

Publication Type

Journal Articles	83
Reports - Research	62
Reports - Evaluative	22
Information Analyses	3
Speeches/Meeting Papers	2
Reports - Descriptive	1

Education Level

Elementary Secondary Education	8
Secondary Education	8
Grade 5	6
Grade 8	6
Elementary Education	5
Higher Education	5
Grade 4	4
Grade 6	3
Grade 7	3
High Schools	3
Middle Schools	3
Postsecondary Education	3
Grade 10	2
Grade 3	2
Intermediate Grades	2
Junior High Schools	2
Grade 11	1
Kindergarten	1
More ▼

Audience

Researchers

Location

Texas	4
North Carolina	3
Canada	2
Florida	2
New York	2
Virginia	2
Australia	1
California	1
Colorado	1
Georgia	1
Maryland	1
Massachusetts	1
Oklahoma	1
South Carolina	1
Spain	1
Tennessee	1
United States	1
More ▼

Laws, Policies, & Programs

Assessments and Surveys

Program for International…	8
Graduate Record Examinations	3
SAT (College Admission Test)	3
National Assessment of…	2
Trends in International…	2
ACT Assessment	1
Early Childhood Longitudinal…	1
Georgia Criterion Referenced…	1
Iowa Tests of Educational…	1
Test of English as a Foreign…	1

What Works Clearinghouse Rating

Showing 1 to 15 of 83 results Save | Export

IRT Characteristic Curve Linking Methods Weighted by Information for Mixed-Format Tests

Peer reviewed

Direct link

Shaojie Wang; Won-Chan Lee; Minqiang Zhang; Lixin Yuan – Applied Measurement in Education, 2024

To reduce the impact of parameter estimation errors on IRT linking results, recent work introduced two information-weighted characteristic curve methods for dichotomous items. These two methods showed outstanding performance in both simulation and pseudo-form pseudo-group analysis. The current study expands upon the concept of information…

Descriptors: Item Response Theory, Test Format, Test Length, Error of Measurement

Are Large Admissions Test Coaching Effects Widespread? A Longitudinal Analysis of Admissions Test Scores

Peer reviewed

Direct link

Dahlke, Jeffrey A.; Sackett, Paul R.; Kuncel, Nathan R. – Applied Measurement in Education, 2023

We examine longitudinal data from 120,384 students who took a version of the PSAT/SAT in the 9th, 10th, 11th, and 12th grades. We investigate score changes over time and show that socioeconomic status (SES) is related to the degree of score improvement. We note that the 9th and 10th grade PSAT are low-stakes tests, while the operational SAT is a…

Descriptors: Scores, College Entrance Examinations, Socioeconomic Status, Test Preparation

Comparing the Robustness of Three Nonparametric DIF Procedures to Differential Rapid Guessing

Peer reviewed

Direct link

Abulela, Mohammed A. A.; Rios, Joseph A. – Applied Measurement in Education, 2022

When there are no personal consequences associated with test performance for examinees, rapid guessing (RG) is a concern and can differ between subgroups. To date, the impact of differential RG on item-level measurement invariance has received minimal attention. To that end, a simulation study was conducted to examine the robustness of the…

Descriptors: Comparative Analysis, Robustness (Statistics), Nonparametric Statistics, Item Analysis

Statistically Comparing the Performance of Multiple Automated Raters across Multiple Items

Peer reviewed

Direct link

Kieftenbeld, Vincent; Boyer, Michelle – Applied Measurement in Education, 2017

Automated scoring systems are typically evaluated by comparing the performance of a single automated rater item-by-item to human raters. This presents a challenge when the performance of multiple raters needs to be compared across multiple items. Rankings could depend on specifics of the ranking procedure; observed differences could be due to…

Descriptors: Automation, Scoring, Comparative Analysis, Test Items

Maintaining Score Scales over Time: A Comparison of Five Scoring Methods

Peer reviewed

Direct link

Kim, Stella Yun; Lee, Won-Chan – Applied Measurement in Education, 2023

This study evaluates various scoring methods including number-correct scoring, IRT theta scoring, and hybrid scoring in terms of scale-score stability over time. A simulation study was conducted to examine the relative performance of five scoring methods in terms of preserving the first two moments of scale scores for a population in a chain of…

Descriptors: Scoring, Comparative Analysis, Item Response Theory, Simulation

Response Time Differences between Computers and Tablets

Peer reviewed

Direct link

Kong, Xiaojing; Davis, Laurie Laughlin; McBride, Yuanyuan; Morrison, Kristin – Applied Measurement in Education, 2018

Item response time data were used in investigating the differences in student test-taking behavior between two device conditions: computer and tablet. Analyses were conducted to address the questions of whether or not the device condition had a differential impact on rapid guessing and solution behaviors (with response time effort used as an…

Descriptors: Educational Technology, Technology Uses in Education, Computers, Handheld Devices

The Impact of Three Factors on the Recovery of Item Parameters for the Three-Parameter Logistic Model

Peer reviewed

Direct link

Kim, Kyung Yong; Lee, Won-Chan – Applied Measurement in Education, 2017

This article provides a detailed description of three factors (specification of the ability distribution, numerical integration, and frame of reference for the item parameter estimates) that might affect the item parameter estimation of the three-parameter logistic model, and compares five item calibration methods, which are combinations of the…

Descriptors: Test Items, Item Response Theory, Comparative Analysis, Methods

The Comparability of Scores from Different Digital Devices: A Literature Review and Synthesis with Recommendations for Practice

Peer reviewed

Direct link

Dadey, Nathan; Lyons, Susan; DePascale, Charles – Applied Measurement in Education, 2018

Evidence of comparability is generally needed whenever there are variations in the conditions of an assessment administration, including variations introduced by the administration of an assessment on multiple digital devices (e.g., tablet, laptop, desktop). This article is meant to provide a comprehensive examination of issues relevant to the…

Descriptors: Evaluation Methods, Computer Assisted Testing, Educational Technology, Technology Uses in Education

An Empirical Comparison of DDF Detection Methods for Understanding the Causes of DIF in Multiple-Choice Items

Peer reviewed

Direct link

Suh, Youngsuk; Talley, Anna E. – Applied Measurement in Education, 2015

This study compared and illustrated four differential distractor functioning (DDF) detection methods for analyzing multiple-choice items. The log-linear approach, two item response theory-model-based approaches with likelihood ratio tests, and the odds ratio approach were compared to examine the congruence among the four DDF detection methods.…

Descriptors: Test Bias, Multiple Choice Tests, Test Items, Methods

Evaluating Comparative Judgment as an Approach to Essay Scoring

Peer reviewed

Direct link

Steedle, Jeffrey T.; Ferrara, Steve – Applied Measurement in Education, 2016

As an alternative to rubric scoring, comparative judgment generates essay scores by aggregating decisions about the relative quality of the essays. Comparative judgment eliminates certain scorer biases and potentially reduces training requirements, thereby allowing a large number of judges, including teachers, to participate in essay evaluation.…

Descriptors: Essays, Scoring, Comparative Analysis, Evaluators

An Exploratory Analysis of Differential Item Functioning and Its Possible Sources in a Higher Education Admissions Context

Peer reviewed

Direct link

Oliveri, Maria Elena; Lawless, Rene; Robin, Frederic; Bridgeman, Brent – Applied Measurement in Education, 2018

We analyzed a pool of items from an admissions test for differential item functioning (DIF) for groups based on age, socioeconomic status, citizenship, or English language status using Mantel-Haenszel and item response theory. DIF items were systematically examined to identify its possible sources by item type, content, and wording. DIF was…

Descriptors: Test Bias, Comparative Analysis, Item Banks, Item Response Theory

A Comparison of Estimation Techniques for IRT Models with Small Samples

Peer reviewed

Direct link

Finch, Holmes; French, Brian F. – Applied Measurement in Education, 2019

The usefulness of item response theory (IRT) models depends, in large part, on the accuracy of item and person parameter estimates. For the standard 3 parameter logistic model, for example, these parameters include the item parameters of difficulty, discrimination, and pseudo-chance, as well as the person ability parameter. Several factors impact…

Descriptors: Item Response Theory, Accuracy, Test Items, Difficulty Level

Initial Considerations When Applying an Instructional Sensitivity Framework: Partitioning the Variation between and within Classrooms for Two Mathematics Assessments

Peer reviewed

Direct link

Ing, Marsha – Applied Measurement in Education, 2016

Drawing inferences about the extent to which student performance reflects instructional opportunities relies on the premise that the measure of student performance is reflective of instructional opportunities. An instructional sensitivity framework suggests that some assessments are more sensitive to detecting differences in instructional…

Descriptors: Mathematics Tests, Mathematics Achievement, Performance, Educational Opportunities

Standard Errors for National Trends in International Large-Scale Assessments in the Case of Cross-National Differential Item Functioning

Peer reviewed

Direct link

Sachse, Karoline A.; Haag, Nicole – Applied Measurement in Education, 2017

Standard errors computed according to the operational practices of international large-scale assessment studies such as the Programme for International Student Assessment's (PISA) or the Trends in International Mathematics and Science Study (TIMSS) may be biased when cross-national differential item functioning (DIF) and item parameter drift are…

Descriptors: Error of Measurement, Test Bias, International Assessment, Computation

Early Childhood Reading Skills and Proficiency in NAEP Eighth-Grade Reading Assessment

Peer reviewed

Direct link

Dogan, Enis; Ogut, Burhan; Kim, Young Yee – Applied Measurement in Education, 2015

The relationship between reading skills in earlier grades and achieving "Proficiency" on the National Assessment of Educational Progress (NAEP) grade 8 reading assessment was examined by establishing a statistical link between NAEP and the Early Childhood Longitudinal Study (ECLS) grade 8 reading assessments using data from a common…

Descriptors: Reading Skills, National Competency Tests, Reading Tests, Grade 8

Previous Page | Next Page »

Pages: 1 | 2 | 3 | 4 | 5 | 6

Davis, Laurie Laughlin	3
Ercikan, Kadriye	3
Lee, Won-Chan	3
Linn, Robert L.	3
Oliveri, Maria Elena	3
Attali, Yigal	2
Bridgeman, Brent	2
Finch, Holmes	2
Hambleton, Ronald K.	2
Kong, Xiaojing	2
McBride, Yuanyuan	2
Penfield, Randall D.	2
Puhan, Gautam	2
Sinharay, Sandip	2
Sireci, Stephen G.	2
Wells, Craig S.	2
Abulela, Mohammed A. A.	1
Andrich, David	1
Awuor, Risper	1
Baldwin, Su	1
Ban, Jae-Chun	1
Banks, Kathleen	1
Barnes, Laura L. B.	1
Beretvas, S. Natasha	1
More ▼