ERIC - Search Results

Publication Date

In 2025	1
Since 2024	1
Since 2021 (last 5 years)	7
Since 2016 (last 10 years)	27
Since 2006 (last 20 years)	52

Source

Educational Measurement:…

Publication Type

Journal Articles	52
Reports - Research	28
Reports - Evaluative	12
Reports - Descriptive	11
Information Analyses	2
Opinion Papers	1

Education Level

Elementary Secondary Education	3
Elementary Education	2
Grade 4	2
Higher Education	2
Secondary Education	2
Grade 3	1
Grade 5	1
High Schools	1
Postsecondary Education	1

Audience

Teachers

Location

California	1
Germany	1
New Hampshire	1
United Kingdom	1
United Kingdom (England)	1

Laws, Policies, & Programs

No Child Left Behind Act 2001

Assessments and Surveys

ACT Assessment	2
SAT (College Admission Test)	2
Graduate Record Examinations	1
Preliminary Scholastic…	1

What Works Clearinghouse Rating

Showing 1 to 15 of 52 results Save | Export

A Rubric for the Detection of Students in Crisis

Peer reviewed

Direct link

Burkhardt, Amy; Lottridge, Susan; Woolf, Sherri – Educational Measurement: Issues and Practice, 2021

For some students, standardized tests serve as a conduit to disclose sensitive issues of harm or distress that may otherwise go unreported. By detecting this writing, known as "crisis papers," testing programs have a unique opportunity to assist in mitigating the risk of harm to these students. The use of machine learning to…

Descriptors: Scoring Rubrics, Identification, At Risk Students, Standardized Tests

Examining the Psychometric Impact of Targeted and Random Double-Scoring in Mixed-Format Assessments

Peer reviewed

Direct link

Yangmeng Xu; Stefanie A. Wind – Educational Measurement: Issues and Practice, 2025

Double-scoring constructed-response items is a common but costly practice in mixed-format assessments. This study explored the impacts of Targeted Double-Scoring (TDS) and random double-scoring procedures on the quality of psychometric outcomes, including student achievement estimates, person fit, and student classifications under various…

Descriptors: Academic Achievement, Psychometrics, Scoring, Evaluation Methods

To Score or Not to Score: Factors Influencing Performance and Feasibility of Automatic Content Scoring of Text Responses

Peer reviewed

Direct link

Zesch, Torsten; Horbach, Andrea; Zehner, Fabian – Educational Measurement: Issues and Practice, 2023

In this article, we systematize the factors influencing performance and feasibility of automatic content scoring methods for short text responses. We argue that performance (i.e., how well an automatic system agrees with human judgments) mainly depends on the linguistic variance seen in the responses and that this variance is indirectly influenced…

Descriptors: Influences, Academic Achievement, Feasibility Studies, Automation

Using Active Learning Methods to Strategically Select Essays for Automated Scoring

Peer reviewed

Direct link

Firoozi, Tahereh; Mohammadi, Hamid; Gierl, Mark J. – Educational Measurement: Issues and Practice, 2023

Research on Automated Essay Scoring has become increasing important because it serves as a method for evaluating students' written responses at scale. Scalable methods for scoring written responses are needed as students migrate to online learning environments resulting in the need to evaluate large numbers of written-response assessments. The…

Descriptors: Active Learning, Automation, Scoring, Essays

Setting and Validating Multiple Standards on a Multistage-Adaptive Test

Peer reviewed

Direct link

Lewis, Jennifer; Lim, Hwanggyu; Padellaro, Frank; Sireci, Stephen G.; Zenisky, April L. – Educational Measurement: Issues and Practice, 2022

Setting cut scores on (MSTs) is difficult, particularly when the test spans several grade levels, and the selection of items from MST panels must reflect the operational test specifications. In this study, we describe, illustrate, and evaluate three methods for mapping panelists' Angoff ratings into cut scores on the scale underlying an MST. The…

Descriptors: Cutting Scores, Adaptive Testing, Test Items, Item Analysis

A Problem with the Bookmark Procedure's Correction for Guessing

Peer reviewed

Direct link

Baldwin, Peter – Educational Measurement: Issues and Practice, 2021

In the Bookmark standard-setting procedure, panelists are instructed to consider what examinees know rather than what they might attain by guessing; however, because examinees sometimes do guess, the procedure includes a correction for guessing. Like other corrections for guessing, the Bookmark's correction assumes that examinees either know the…

Descriptors: Guessing (Tests), Student Evaluation, Evaluation Methods, Standard Setting (Scoring)

Bilevel Topic Model-Based Multitask Learning for Constructed-Responses Multidimensional Automated Scoring and Interpretation

Peer reviewed

Direct link

Xiong, Jiawei; Li, Feiming – Educational Measurement: Issues and Practice, 2023

Multidimensional scoring evaluates each constructed-response answer from more than one rating dimension and/or trait such as lexicon, organization, and supporting ideas instead of only one holistic score, to help students distinguish between various dimensions of writing quality. In this work, we present a bilevel learning model for combining two…

Descriptors: Scoring, Models, Task Analysis, Learning Processes

The Choice of Response Probability in Bookmark Standard Setting: An Experimental Study

Peer reviewed

Direct link

Baldwin, Peter; Margolis, Melissa J.; Clauser, Brian E.; Mee, Janet; Winward, Marcia – Educational Measurement: Issues and Practice, 2020

Evidence of the internal consistency of standard-setting judgments is a critical part of the validity argument for tests used to make classification decisions. The bookmark standard-setting procedure is a popular approach to establishing performance standards, but there is relatively little research that reflects on the internal consistency of the…

Descriptors: Standard Setting (Scoring), Probability, Cutting Scores, Evaluation Methods

A Critical Look into the Beuk Standard-Setting Method

Peer reviewed

Direct link

Wyse, Adam E. – Educational Measurement: Issues and Practice, 2020

One commonly used compromise standard-setting method is the Beuk (1984) method. A key assumption of the Beuk method is that the emphasis given to the pass rate and the percent correct ratings should be proportional to the extent that the panelists agree on their ratings. However, whether the slope of Beuk line reflects the emphasis that panelists…

Descriptors: Standard Setting (Scoring), Cutting Scores, Weighted Scores, Evaluation Methods

Using Diagnostic Profiles to Describe Borderline Performance in Standard Setting

Peer reviewed

Direct link

Skaggs, Gary; Hein, Serge F.; Wilkins, Jesse L. M. – Educational Measurement: Issues and Practice, 2020

In test-centered standard-setting methods, borderline performance can be represented by many different profiles of strengths and weaknesses. As a result, asking panelists to estimate item or test performance for a hypothetical group study of borderline examinees, or a typical borderline examinee, may be an extremely difficult task and one that can…

Descriptors: Standard Setting (Scoring), Cutting Scores, Testing Problems, Profiles

Adding Objectivity to Standard Setting: Evaluating Consequence Using the Conscious and Subconscious Weight Methods

Peer reviewed

Direct link

Leventhal, Brian C.; Grabovsky, Irina – Educational Measurement: Issues and Practice, 2020

Standard setting is arguably one of the most subjective techniques in test development and psychometrics. The decisions when scores are compared to standards, however, are arguably the most consequential outcomes of testing. Providing licensure to practice in a profession has high stake consequences for the public. Denying graduation or forcing…

Descriptors: Standard Setting (Scoring), Weighted Scores, Test Construction, Psychometrics

It's Not Just Angoff: Misperceptions of Hard and Easy Items in Bookmark-Type Ratings

Peer reviewed

Direct link

Wyse, Adam E.; Babcock, Ben – Educational Measurement: Issues and Practice, 2020

A common belief is that the Bookmark method is a cognitively simpler standard-setting method than the modified Angoff method. However, a limited amount of research has investigated panelist's ability to perform well the Bookmark method, and whether some of the challenges panelists face with the Angoff method may also be present in the Bookmark…

Descriptors: Standard Setting (Scoring), Evaluation Methods, Testing Problems, Test Items

Standardization and "UNDERSTAND"ardization in Educational Assessment

Peer reviewed

Direct link

Sireci, Stephen G. – Educational Measurement: Issues and Practice, 2020

Educational tests are standardized so that all examinees are tested on the same material, under the same testing conditions, and with the same scoring protocols. This uniformity is designed to provide a level "playing field" for all examinees so that the test is "the same" for everyone. Thus, standardization is designed to…

Descriptors: Standards, Educational Assessment, Culture Fair Tests, Scoring

Digital Module 18: Automated Scoring

Peer reviewed

Direct link

Lottridge, Sue; Burkhardt, Amy; Boyer, Michelle – Educational Measurement: Issues and Practice, 2020

In this digital ITEMS module, Dr. Sue Lottridge, Amy Burkhardt, and Dr. Michelle Boyer provide an overview of automated scoring. Automated scoring is the use of computer algorithms to score unconstrained open-ended test items by mimicking human scoring. The use of automated scoring is increasing in educational assessment programs because it allows…

Descriptors: Computer Assisted Testing, Scoring, Automation, Educational Assessment

Rubric Rating with MFRM versus Randomly Distributed Comparative Judgment: A Comparison of Two Approaches to Second-Language Writing Assessment

Peer reviewed

Direct link

Sims, Maureen E.; Cox, Troy L.; Eckstein, Grant T.; Hartshorn, K. James; Wilcox, Matthew P.; Hart, Judson M. – Educational Measurement: Issues and Practice, 2020

The purpose of this study is to explore the reliability of a potentially more practical approach to direct writing assessment in the context of ESL writing. Traditional rubric rating (RR) is a common yet resource-intensive evaluation practice when performed reliably. This study compared the traditional rubric model of ESL writing assessment and…

Descriptors: Scoring Rubrics, Item Response Theory, Second Language Learning, English (Second Language)

Previous Page | Next Page »

Pages: 1 | 2 | 3 | 4

Clauser, Brian E.	4
Margolis, Melissa J.	4
Wyse, Adam E.	4
Mee, Janet	3
Allalouf, Avi	2
Babcock, Ben	2
Baldwin, Peter	2
Burkhardt, Amy	2
Sireci, Stephen G.	2
Winward, Marcia	2
Anderson, Dan	1
Andrade, Heidi L.	1
Aray, Henry	1
Attali, Yigal	1
Baird, Jo-Anne	1
Baumer, Michal	1
Bejar, Issac I.	1
Bertling, Maria	1
Blackmore, John	1
Boyer, Michelle	1
Brew, Chris	1
Breyer, F. Jay	1
Clark, A. K.	1
Clauser, Jerome C.	1
Cox, Troy L.	1
More ▼

Scoring	31
Standard Setting (Scoring)	16
Cutting Scores	13
Evaluation Methods	12
Test Items	10
Validity	9
Psychometrics	8
Scoring Rubrics	8
Computer Assisted Testing	7
Scores	7
Automation	6
Testing	6
Academic Achievement	5
Comparative Analysis	5
Error of Measurement	5
Interrater Reliability	5
Item Response Theory	5
Student Evaluation	5
Test Reliability	5
Expertise	4
High Stakes Tests	4
Performance Based Assessment	4
Test Construction	4
Test Validity	4
Accuracy	3
More ▼