Publication Date
In 2025 | 0 |
Since 2024 | 1 |
Since 2021 (last 5 years) | 4 |
Since 2016 (last 10 years) | 25 |
Since 2006 (last 20 years) | 58 |
Descriptor
Comparative Analysis | 83 |
Test Items | 31 |
Item Response Theory | 26 |
Mathematics Tests | 16 |
Computer Assisted Testing | 14 |
Scores | 14 |
Test Bias | 14 |
Statistical Analysis | 12 |
Foreign Countries | 11 |
Simulation | 10 |
Achievement Tests | 9 |
More ▼ |
Source
Applied Measurement in… | 83 |
Author
Davis, Laurie Laughlin | 3 |
Ercikan, Kadriye | 3 |
Lee, Won-Chan | 3 |
Linn, Robert L. | 3 |
Oliveri, Maria Elena | 3 |
Attali, Yigal | 2 |
Bridgeman, Brent | 2 |
Finch, Holmes | 2 |
Hambleton, Ronald K. | 2 |
Kong, Xiaojing | 2 |
McBride, Yuanyuan | 2 |
More ▼ |
Publication Type
Journal Articles | 83 |
Reports - Research | 62 |
Reports - Evaluative | 22 |
Information Analyses | 3 |
Speeches/Meeting Papers | 2 |
Reports - Descriptive | 1 |
Education Level
Elementary Secondary Education | 8 |
Secondary Education | 8 |
Grade 5 | 6 |
Grade 8 | 6 |
Elementary Education | 5 |
Higher Education | 5 |
Grade 4 | 4 |
Grade 6 | 3 |
Grade 7 | 3 |
High Schools | 3 |
Middle Schools | 3 |
More ▼ |
Audience
Researchers | 1 |
Location
Texas | 4 |
North Carolina | 3 |
Canada | 2 |
Florida | 2 |
New York | 2 |
Virginia | 2 |
Australia | 1 |
California | 1 |
Colorado | 1 |
Georgia | 1 |
Maryland | 1 |
More ▼ |
Laws, Policies, & Programs
Assessments and Surveys
What Works Clearinghouse Rating
Shaojie Wang; Won-Chan Lee; Minqiang Zhang; Lixin Yuan – Applied Measurement in Education, 2024
To reduce the impact of parameter estimation errors on IRT linking results, recent work introduced two information-weighted characteristic curve methods for dichotomous items. These two methods showed outstanding performance in both simulation and pseudo-form pseudo-group analysis. The current study expands upon the concept of information…
Descriptors: Item Response Theory, Test Format, Test Length, Error of Measurement
Dahlke, Jeffrey A.; Sackett, Paul R.; Kuncel, Nathan R. – Applied Measurement in Education, 2023
We examine longitudinal data from 120,384 students who took a version of the PSAT/SAT in the 9th, 10th, 11th, and 12th grades. We investigate score changes over time and show that socioeconomic status (SES) is related to the degree of score improvement. We note that the 9th and 10th grade PSAT are low-stakes tests, while the operational SAT is a…
Descriptors: Scores, College Entrance Examinations, Socioeconomic Status, Test Preparation
Abulela, Mohammed A. A.; Rios, Joseph A. – Applied Measurement in Education, 2022
When there are no personal consequences associated with test performance for examinees, rapid guessing (RG) is a concern and can differ between subgroups. To date, the impact of differential RG on item-level measurement invariance has received minimal attention. To that end, a simulation study was conducted to examine the robustness of the…
Descriptors: Comparative Analysis, Robustness (Statistics), Nonparametric Statistics, Item Analysis
Kieftenbeld, Vincent; Boyer, Michelle – Applied Measurement in Education, 2017
Automated scoring systems are typically evaluated by comparing the performance of a single automated rater item-by-item to human raters. This presents a challenge when the performance of multiple raters needs to be compared across multiple items. Rankings could depend on specifics of the ranking procedure; observed differences could be due to…
Descriptors: Automation, Scoring, Comparative Analysis, Test Items
Kim, Stella Yun; Lee, Won-Chan – Applied Measurement in Education, 2023
This study evaluates various scoring methods including number-correct scoring, IRT theta scoring, and hybrid scoring in terms of scale-score stability over time. A simulation study was conducted to examine the relative performance of five scoring methods in terms of preserving the first two moments of scale scores for a population in a chain of…
Descriptors: Scoring, Comparative Analysis, Item Response Theory, Simulation
Kong, Xiaojing; Davis, Laurie Laughlin; McBride, Yuanyuan; Morrison, Kristin – Applied Measurement in Education, 2018
Item response time data were used in investigating the differences in student test-taking behavior between two device conditions: computer and tablet. Analyses were conducted to address the questions of whether or not the device condition had a differential impact on rapid guessing and solution behaviors (with response time effort used as an…
Descriptors: Educational Technology, Technology Uses in Education, Computers, Handheld Devices
Kim, Kyung Yong; Lee, Won-Chan – Applied Measurement in Education, 2017
This article provides a detailed description of three factors (specification of the ability distribution, numerical integration, and frame of reference for the item parameter estimates) that might affect the item parameter estimation of the three-parameter logistic model, and compares five item calibration methods, which are combinations of the…
Descriptors: Test Items, Item Response Theory, Comparative Analysis, Methods
Dadey, Nathan; Lyons, Susan; DePascale, Charles – Applied Measurement in Education, 2018
Evidence of comparability is generally needed whenever there are variations in the conditions of an assessment administration, including variations introduced by the administration of an assessment on multiple digital devices (e.g., tablet, laptop, desktop). This article is meant to provide a comprehensive examination of issues relevant to the…
Descriptors: Evaluation Methods, Computer Assisted Testing, Educational Technology, Technology Uses in Education
Suh, Youngsuk; Talley, Anna E. – Applied Measurement in Education, 2015
This study compared and illustrated four differential distractor functioning (DDF) detection methods for analyzing multiple-choice items. The log-linear approach, two item response theory-model-based approaches with likelihood ratio tests, and the odds ratio approach were compared to examine the congruence among the four DDF detection methods.…
Descriptors: Test Bias, Multiple Choice Tests, Test Items, Methods
Steedle, Jeffrey T.; Ferrara, Steve – Applied Measurement in Education, 2016
As an alternative to rubric scoring, comparative judgment generates essay scores by aggregating decisions about the relative quality of the essays. Comparative judgment eliminates certain scorer biases and potentially reduces training requirements, thereby allowing a large number of judges, including teachers, to participate in essay evaluation.…
Descriptors: Essays, Scoring, Comparative Analysis, Evaluators
Oliveri, Maria Elena; Lawless, Rene; Robin, Frederic; Bridgeman, Brent – Applied Measurement in Education, 2018
We analyzed a pool of items from an admissions test for differential item functioning (DIF) for groups based on age, socioeconomic status, citizenship, or English language status using Mantel-Haenszel and item response theory. DIF items were systematically examined to identify its possible sources by item type, content, and wording. DIF was…
Descriptors: Test Bias, Comparative Analysis, Item Banks, Item Response Theory
Finch, Holmes; French, Brian F. – Applied Measurement in Education, 2019
The usefulness of item response theory (IRT) models depends, in large part, on the accuracy of item and person parameter estimates. For the standard 3 parameter logistic model, for example, these parameters include the item parameters of difficulty, discrimination, and pseudo-chance, as well as the person ability parameter. Several factors impact…
Descriptors: Item Response Theory, Accuracy, Test Items, Difficulty Level
Ing, Marsha – Applied Measurement in Education, 2016
Drawing inferences about the extent to which student performance reflects instructional opportunities relies on the premise that the measure of student performance is reflective of instructional opportunities. An instructional sensitivity framework suggests that some assessments are more sensitive to detecting differences in instructional…
Descriptors: Mathematics Tests, Mathematics Achievement, Performance, Educational Opportunities
Sachse, Karoline A.; Haag, Nicole – Applied Measurement in Education, 2017
Standard errors computed according to the operational practices of international large-scale assessment studies such as the Programme for International Student Assessment's (PISA) or the Trends in International Mathematics and Science Study (TIMSS) may be biased when cross-national differential item functioning (DIF) and item parameter drift are…
Descriptors: Error of Measurement, Test Bias, International Assessment, Computation
Dogan, Enis; Ogut, Burhan; Kim, Young Yee – Applied Measurement in Education, 2015
The relationship between reading skills in earlier grades and achieving "Proficiency" on the National Assessment of Educational Progress (NAEP) grade 8 reading assessment was examined by establishing a statistical link between NAEP and the Early Childhood Longitudinal Study (ECLS) grade 8 reading assessments using data from a common…
Descriptors: Reading Skills, National Competency Tests, Reading Tests, Grade 8