Publication Date
In 2025 | 0 |
Since 2024 | 0 |
Since 2021 (last 5 years) | 1 |
Since 2016 (last 10 years) | 4 |
Since 2006 (last 20 years) | 6 |
Descriptor
Statistical Analysis | 7 |
Test Validity | 4 |
Correlation | 3 |
Validity | 3 |
Essays | 2 |
Evaluators | 2 |
Foreign Countries | 2 |
High Stakes Tests | 2 |
Interrater Reliability | 2 |
Item Response Theory | 2 |
Models | 2 |
More ▼ |
Source
Applied Measurement in… | 7 |
Author
Allen, Jeff | 1 |
Ben-Simon, Anat | 1 |
Cohen, Allan | 1 |
Cohen, Yoav | 1 |
Downing, Steven M. | 1 |
Eklöf, Hanna | 1 |
Ferrara, Steve | 1 |
Grønmo, Liv Sissel | 1 |
Haladyna, Thomas M. | 1 |
Karadavut, Tugba | 1 |
Levi, Effi | 1 |
More ▼ |
Publication Type
Journal Articles | 7 |
Reports - Research | 5 |
Reports - Evaluative | 2 |
Education Level
Grade 12 | 1 |
Grade 7 | 1 |
High Schools | 1 |
Higher Education | 1 |
Audience
Laws, Policies, & Programs
Assessments and Surveys
National Assessment of… | 1 |
Program for International… | 1 |
Trends in International… | 1 |
What Works Clearinghouse Rating
Karadavut, Tugba – Applied Measurement in Education, 2021
Mixture IRT models address the heterogeneity in a population by extracting latent classes and allowing item parameters to vary between latent classes. Once the latent classes are extracted, they need to be further examined to be characterized. Some approaches have been adopted in the literature for this purpose. These approaches examine either the…
Descriptors: Item Response Theory, Models, Test Items, Maximum Likelihood Statistics
Raczynski, Kevin; Cohen, Allan – Applied Measurement in Education, 2018
The literature on Automated Essay Scoring (AES) systems has provided useful validation frameworks for any assessment that includes AES scoring. Furthermore, evidence for the scoring fidelity of AES systems is accumulating. Yet questions remain when appraising the scoring performance of AES systems. These questions include: (a) which essays are…
Descriptors: Essay Tests, Test Scoring Machines, Test Validity, Evaluators
Cohen, Yoav; Levi, Effi; Ben-Simon, Anat – Applied Measurement in Education, 2018
In the current study, two pools of 250 essays, all written as a response to the same prompt, were rated by two groups of raters (14 or 15 raters per group), thereby providing an approximation to the essay's true score. An automated essay scoring (AES) system was trained on the datasets and then scored the essays using a cross-validation scheme. By…
Descriptors: Test Validity, Automation, Scoring, Computer Assisted Testing
Steedle, Jeffrey T.; Ferrara, Steve – Applied Measurement in Education, 2016
As an alternative to rubric scoring, comparative judgment generates essay scores by aggregating decisions about the relative quality of the essays. Comparative judgment eliminates certain scorer biases and potentially reduces training requirements, thereby allowing a large number of judges, including teachers, to participate in essay evaluation.…
Descriptors: Essays, Scoring, Comparative Analysis, Evaluators
Eklöf, Hanna; Pavešic, Barbara Japelj; Grønmo, Liv Sissel – Applied Measurement in Education, 2014
The purpose of the study was to measure students' reported test-taking effort and the relationship between reported effort and performance on the Trends in International Mathematics and Science Study (TIMSS) Advanced mathematics test. This was done in three countries participating in TIMSS Advanced 2008 (Sweden, Norway, and Slovenia), and the…
Descriptors: Mathematics Tests, Cross Cultural Studies, Foreign Countries, Correlation
Allen, Jeff; Robbins, Steven B.; Sawyer, Richard – Applied Measurement in Education, 2010
Research on the validity of psychosocial factors (PSFs) and other noncognitive predictors of college outcomes has largely ignored the practical benefits implied by the validity. We summarize evidence of the validity of PSF measures as predictors of college outcomes and then explain how this validity directly translates into improved identification…
Descriptors: Institutional Research, Academic Persistence, Validity, At Risk Students

Downing, Steven M.; Haladyna, Thomas M. – Applied Measurement in Education, 1997
An ideal process is outlined for test item development and the study of item responses to ensure that tests are sound. Qualitative and quantitative methods are used to assess the item-level validity evidence for high-stakes examinations. A checklist for assessment is provided. (SLD)
Descriptors: High Stakes Tests, Item Response Theory, Qualitative Research, Quality Control