ERIC - Search Results

Publication Date

In 2025	1
Since 2024	1
Since 2021 (last 5 years)	5
Since 2016 (last 10 years)	15
Since 2006 (last 20 years)	31

Descriptor

Scoring	71
Test Interpretation	15
Elementary Secondary Education	14
Test Construction	13
Testing Problems	13
Academic Achievement	11
Interrater Reliability	11
Performance Based Assessment	11
Computer Assisted Testing	10
Achievement Tests	9
Educational Assessment	9
Student Evaluation	9
Test Items	9
Test Use	9
Evaluators	8
Scores	8
Test Validity	8
Test Reliability	7
Automation	6
Essay Tests	6
Evaluation Methods	6
Standard Setting (Scoring)	6
State Programs	6
Testing Programs	6
Validity	6
More ▼

Source

Educational Measurement:…

Publication Type

Journal Articles	71
Reports - Evaluative	28
Reports - Research	17
Reports - Descriptive	14
Information Analyses	6
Opinion Papers	6
Tests/Questionnaires	4
Guides - Non-Classroom	3
Guides - Classroom - Teacher	1
Speeches/Meeting Papers	1

Education Level

Elementary Secondary Education	3
Elementary Education	1
Grade 4	1
Grade 5	1
High Schools	1
Higher Education	1
Secondary Education	1

Audience

Teachers	3
Researchers	2
Practitioners	1

Location

California	1
Nebraska	1
New Hampshire	1
Pennsylvania	1
United Kingdom	1
United Kingdom (England)	1

Laws, Policies, & Programs

Education Consolidation…	1
No Child Left Behind Act 2001	1

Assessments and Surveys

ACT Assessment	2
SAT (College Admission Test)	2
Graduate Record Examinations	1
National Assessment of…	1
Preliminary Scholastic…	1
Teacher Performance…	1

What Works Clearinghouse Rating

Showing 1 to 15 of 71 results Save | Export

A Rubric for the Detection of Students in Crisis

Peer reviewed

Direct link

Burkhardt, Amy; Lottridge, Susan; Woolf, Sherri – Educational Measurement: Issues and Practice, 2021

For some students, standardized tests serve as a conduit to disclose sensitive issues of harm or distress that may otherwise go unreported. By detecting this writing, known as "crisis papers," testing programs have a unique opportunity to assist in mitigating the risk of harm to these students. The use of machine learning to…

Descriptors: Scoring Rubrics, Identification, At Risk Students, Standardized Tests

Examining the Psychometric Impact of Targeted and Random Double-Scoring in Mixed-Format Assessments

Peer reviewed

Direct link

Yangmeng Xu; Stefanie A. Wind – Educational Measurement: Issues and Practice, 2025

Double-scoring constructed-response items is a common but costly practice in mixed-format assessments. This study explored the impacts of Targeted Double-Scoring (TDS) and random double-scoring procedures on the quality of psychometric outcomes, including student achievement estimates, person fit, and student classifications under various…

Descriptors: Academic Achievement, Psychometrics, Scoring, Evaluation Methods

To Score or Not to Score: Factors Influencing Performance and Feasibility of Automatic Content Scoring of Text Responses

Peer reviewed

Direct link

Zesch, Torsten; Horbach, Andrea; Zehner, Fabian – Educational Measurement: Issues and Practice, 2023

In this article, we systematize the factors influencing performance and feasibility of automatic content scoring methods for short text responses. We argue that performance (i.e., how well an automatic system agrees with human judgments) mainly depends on the linguistic variance seen in the responses and that this variance is indirectly influenced…

Descriptors: Influences, Academic Achievement, Feasibility Studies, Automation

Using Active Learning Methods to Strategically Select Essays for Automated Scoring

Peer reviewed

Direct link

Firoozi, Tahereh; Mohammadi, Hamid; Gierl, Mark J. – Educational Measurement: Issues and Practice, 2023

Research on Automated Essay Scoring has become increasing important because it serves as a method for evaluating students' written responses at scale. Scalable methods for scoring written responses are needed as students migrate to online learning environments resulting in the need to evaluate large numbers of written-response assessments. The…

Descriptors: Active Learning, Automation, Scoring, Essays

Bilevel Topic Model-Based Multitask Learning for Constructed-Responses Multidimensional Automated Scoring and Interpretation

Peer reviewed

Direct link

Xiong, Jiawei; Li, Feiming – Educational Measurement: Issues and Practice, 2023

Multidimensional scoring evaluates each constructed-response answer from more than one rating dimension and/or trait such as lexicon, organization, and supporting ideas instead of only one holistic score, to help students distinguish between various dimensions of writing quality. In this work, we present a bilevel learning model for combining two…

Descriptors: Scoring, Models, Task Analysis, Learning Processes

Standardization and "UNDERSTAND"ardization in Educational Assessment

Peer reviewed

Direct link

Sireci, Stephen G. – Educational Measurement: Issues and Practice, 2020

Educational tests are standardized so that all examinees are tested on the same material, under the same testing conditions, and with the same scoring protocols. This uniformity is designed to provide a level "playing field" for all examinees so that the test is "the same" for everyone. Thus, standardization is designed to…

Descriptors: Standards, Educational Assessment, Culture Fair Tests, Scoring

Digital Module 18: Automated Scoring

Peer reviewed

Direct link

Lottridge, Sue; Burkhardt, Amy; Boyer, Michelle – Educational Measurement: Issues and Practice, 2020

In this digital ITEMS module, Dr. Sue Lottridge, Amy Burkhardt, and Dr. Michelle Boyer provide an overview of automated scoring. Automated scoring is the use of computer algorithms to score unconstrained open-ended test items by mimicking human scoring. The use of automated scoring is increasing in educational assessment programs because it allows…

Descriptors: Computer Assisted Testing, Scoring, Automation, Educational Assessment

Rater Certification Tests: A Psychometric Approach

Peer reviewed

Direct link

Attali, Yigal – Educational Measurement: Issues and Practice, 2019

Rater training is an important part of developing and conducting large-scale constructed-response assessments. As part of this process, candidate raters have to pass a certification test to confirm that they are able to score consistently and accurately before they begin scoring operationally. Moreover, many assessment programs require raters to…

Descriptors: Evaluators, Certification, High Stakes Tests, Scoring

Assessment for Learning with Diverse Learners in a Digital World

Peer reviewed

Direct link

DiCerbo, Kristen – Educational Measurement: Issues and Practice, 2020

We have the ability to capture data from students' interactions with digital environments as they engage in learning activity. This provides the potential for a reimagining of assessment to one in which assessment become part of our natural education activity and can be used to support learning. These new data allow us to more closely examine the…

Descriptors: Student Diversity, Information Technology, Learning Activities, Learning Processes

The Value of Choice: An Experiment Using Multiple-Choice Tests

Peer reviewed

Direct link

Aray, Henry; Pedauga, Luis – Educational Measurement: Issues and Practice, 2019

This article presents a novel experimental methodology in which groups of students were offered the option to choose between two equivalent scoring rules to assess a multiple-choice test. The effect of choosing the scoring rule on marks is tested. Two major contributions arise from this research. First, it contributes to the literature on the…

Descriptors: Multiple Choice Tests, Scoring, Student Attitudes, Decision Making

Quality Control for Scoring Tests Administered in Continuous Mode: An NCME Instructional Module

Peer reviewed

Direct link

Allalouf, Avi; Gutentag, Tony; Baumer, Michal – Educational Measurement: Issues and Practice, 2017

Quality control (QC) in testing is paramount. QC procedures for tests can be divided into two types. The first type, one that has been well researched, is QC for tests administered to large population groups on few administration dates using a small set of test forms (e.g., large-scale assessment). The second type is QC for tests, usually…

Descriptors: Quality Control, Scoring, Computer Assisted Testing, Error Patterns

Automated Scoring of Constructed-Response Science Items: Prospects and Obstacles

Peer reviewed

Direct link

Liu, Ou Lydia; Brew, Chris; Blackmore, John; Gerard, Libby; Madhok, Jacquie; Linn, Marcia C. – Educational Measurement: Issues and Practice, 2014

Content-based automated scoring has been applied in a variety of science domains. However, many prior applications involved simplified scoring rubrics without considering rubrics representing multiple levels of understanding. This study tested a concept-based scoring tool for content-based scoring, c-rater™, for four science items with rubrics…

Descriptors: Science Tests, Test Items, Scoring, Automation

How Should Colleges Treat Multiple Admissions Test Scores?

Peer reviewed

Direct link

Mattern, Krista; Radunzel, Justine; Bertling, Maria; Ho, Andrew D. – Educational Measurement: Issues and Practice, 2018

The percentage of students retaking college admissions tests is rising. Researchers and college admissions offices currently use a variety of methods for summarizing these multiple scores. Testing organizations such as ACT and the College Board, interested in validity evidence like correlations with first-year grade point average (FYGPA), often…

Descriptors: College Admission, Scores, Correlation, College Entrance Examinations

Rapid-Guessing Behavior: Its Identification, Interpretation, and Implications

Peer reviewed

Direct link

Wise, Steven L. – Educational Measurement: Issues and Practice, 2017

The rise of computer-based testing has brought with it the capability to measure more aspects of a test event than simply the answers selected or constructed by the test taker. One behavior that has drawn much research interest is the time test takers spend responding to individual multiple-choice items. In particular, very short response…

Descriptors: Guessing (Tests), Multiple Choice Tests, Test Items, Reaction Time

Disaggregated Effects of Device on Score Comparability

Peer reviewed

Direct link

Davis, Laurie; Morrison, Kristin; Kong, Xiaojing; McBride, Yuanyuan – Educational Measurement: Issues and Practice, 2017

The use of tablets for large-scale testing programs has transitioned from concept to reality for many state testing programs. This study extended previous research on score comparability between tablets and computers with high school students to compare score distributions across devices for reading, math, and science and to evaluate device…

Descriptors: Computer Assisted Testing, Handheld Devices, Telecommunications, Scoring

Previous Page | Next Page »

Pages: 1 | 2 | 3 | 4 | 5

Allalouf, Avi	2
Burkhardt, Amy	2
Cizek, Gregory J.	2
Frisbie, David A.	2
Plake, Barbara S.	2
Solano-Flores, Guillermo	2
Yen, Wendy M.	2
Anderson, Dan	1
Aray, Henry	1
Attali, Yigal	1
Baird, Jo-Anne	1
Bauer, Ernest A.	1
Baumer, Michal	1
Baxter, Gail P.	1
Bejar, Issac I.	1
Bertling, Maria	1
Blackmore, John	1
Bond, Lloyd	1
Boyer, Michelle	1
Brew, Chris	1
Breyer, F. Jay	1
Brookhart, Susan M.	1
Bunch, Michael B.	1
Burton, Elizabeth	1
More ▼