ERIC - Search Results

Publication Date

In 2025	0
Since 2024	0
Since 2021 (last 5 years)	1
Since 2016 (last 10 years)	4
Since 2006 (last 20 years)	6

Descriptor

Statistical Analysis	7
Test Validity	4
Correlation	3
Validity	3
Essays	2
Evaluators	2
Foreign Countries	2
High Stakes Tests	2
Interrater Reliability	2
Item Response Theory	2
Models	2
Scoring	2
Test Items	2
Test Scoring Machines	2
Academic Achievement	1
Academic Failure	1
Academic Persistence	1
Accuracy	1
At Risk Students	1
Automation	1
Comparative Analysis	1
Computer Assisted Testing	1
Cross Cultural Studies	1
Cues	1
Dropouts	1
More ▼

Source

Applied Measurement in…

Author

Allen, Jeff	1
Ben-Simon, Anat	1
Cohen, Allan	1
Cohen, Yoav	1
Downing, Steven M.	1
Eklöf, Hanna	1
Ferrara, Steve	1
Grønmo, Liv Sissel	1
Haladyna, Thomas M.	1
Karadavut, Tugba	1
Levi, Effi	1
Pavešic, Barbara Japelj	1
Raczynski, Kevin	1
Robbins, Steven B.	1
Sawyer, Richard	1
Steedle, Jeffrey T.	1
More ▼

Publication Type

Journal Articles	7
Reports - Research	5
Reports - Evaluative	2

Education Level

Grade 12	1
Grade 7	1
High Schools	1
Higher Education	1

Audience

Location

Israel	1
Norway	1
Slovenia	1
Sweden	1

Laws, Policies, & Programs

Assessments and Surveys

National Assessment of…	1
Program for International…	1
Trends in International…	1

What Works Clearinghouse Rating

Showing all 7 results Save | Export

Characterizing the Latent Classes in a Mixture IRT Model Using DIF

Peer reviewed

Direct link

Karadavut, Tugba – Applied Measurement in Education, 2021

Mixture IRT models address the heterogeneity in a population by extracting latent classes and allowing item parameters to vary between latent classes. Once the latent classes are extracted, they need to be further examined to be characterized. Some approaches have been adopted in the literature for this purpose. These approaches examine either the…

Descriptors: Item Response Theory, Models, Test Items, Maximum Likelihood Statistics

Appraising the Scoring Performance of Automated Essay Scoring Systems--Some Additional Considerations: Which Essays? Which Human Raters? Which Scores?

Peer reviewed

Direct link

Raczynski, Kevin; Cohen, Allan – Applied Measurement in Education, 2018

The literature on Automated Essay Scoring (AES) systems has provided useful validation frameworks for any assessment that includes AES scoring. Furthermore, evidence for the scoring fidelity of AES systems is accumulating. Yet questions remain when appraising the scoring performance of AES systems. These questions include: (a) which essays are…

Descriptors: Essay Tests, Test Scoring Machines, Test Validity, Evaluators

Validating Human and Automated Scoring of Essays against "True" Scores

Peer reviewed

Direct link

Cohen, Yoav; Levi, Effi; Ben-Simon, Anat – Applied Measurement in Education, 2018

In the current study, two pools of 250 essays, all written as a response to the same prompt, were rated by two groups of raters (14 or 15 raters per group), thereby providing an approximation to the essay's true score. An automated essay scoring (AES) system was trained on the datasets and then scored the essays using a cross-validation scheme. By…

Descriptors: Test Validity, Automation, Scoring, Computer Assisted Testing

Evaluating Comparative Judgment as an Approach to Essay Scoring

Peer reviewed

Direct link

Steedle, Jeffrey T.; Ferrara, Steve – Applied Measurement in Education, 2016

As an alternative to rubric scoring, comparative judgment generates essay scores by aggregating decisions about the relative quality of the essays. Comparative judgment eliminates certain scorer biases and potentially reduces training requirements, thereby allowing a large number of judges, including teachers, to participate in essay evaluation.…

Descriptors: Essays, Scoring, Comparative Analysis, Evaluators

A Cross-National Comparison of Reported Effort and Mathematics Performance in TIMSS Advanced

Peer reviewed

Direct link

Eklöf, Hanna; Pavešic, Barbara Japelj; Grønmo, Liv Sissel – Applied Measurement in Education, 2014

The purpose of the study was to measure students' reported test-taking effort and the relationship between reported effort and performance on the Trends in International Mathematics and Science Study (TIMSS) Advanced mathematics test. This was done in three countries participating in TIMSS Advanced 2008 (Sweden, Norway, and Slovenia), and the…

Descriptors: Mathematics Tests, Cross Cultural Studies, Foreign Countries, Correlation

Can Measuring Psychosocial Factors Promote College Success?

Peer reviewed

Direct link

Allen, Jeff; Robbins, Steven B.; Sawyer, Richard – Applied Measurement in Education, 2010

Research on the validity of psychosocial factors (PSFs) and other noncognitive predictors of college outcomes has largely ignored the practical benefits implied by the validity. We summarize evidence of the validity of PSF measures as predictors of college outcomes and then explain how this validity directly translates into improved identification…

Descriptors: Institutional Research, Academic Persistence, Validity, At Risk Students

Test Item Development: Validity Evidence from Quality Assurance Procedures.

Peer reviewed

Downing, Steven M.; Haladyna, Thomas M. – Applied Measurement in Education, 1997

An ideal process is outlined for test item development and the study of item responses to ensure that tests are sound. Qualitative and quantitative methods are used to assess the item-level validity evidence for high-stakes examinations. A checklist for assessment is provided. (SLD)

Descriptors: High Stakes Tests, Item Response Theory, Qualitative Research, Quality Control