ERIC - Search Results

Publication Date

In 2026	0
Since 2025	4
Since 2022 (last 5 years)	6
Since 2017 (last 10 years)	19
Since 2007 (last 20 years)	55

Descriptor

Scores	103
Testing	103
Test Reliability	71
Test Validity	43
Test Construction	27
Reliability	25
Scoring	23
Test Interpretation	19
Achievement Tests	18
Comparative Analysis	17
Standardized Tests	16
Academic Achievement	15
Item Response Theory	15
Language Tests	15
Statistical Analysis	14
Measurement	13
Test Bias	13
Interrater Reliability	12
Item Analysis	12
Validity	12
Error of Measurement	11
Foreign Countries	11
Language Proficiency	11
Psychometrics	11
Second Language Learning	11
More ▼

Publication Type

Journal Articles	56
Reports - Research	47
Reports - Evaluative	21
Reports - Descriptive	11
Numerical/Quantitative Data	9
Guides - Non-Classroom	7
Opinion Papers	7
Tests/Questionnaires	7
Speeches/Meeting Papers	5
Information Analyses	3
Dissertations/Theses -…	2
Guides - Classroom - Teacher	1
Guides - General	1
Reference Materials -…	1
Reference Materials - General	1
More ▼

Audience

Practitioners	3
Researchers	2
Teachers	2

Location

United Kingdom (England)	2
United States	2
Australia	1
Canada	1
China	1
China (Beijing)	1
Florida	1
Hungary (Budapest)	1
Iran	1
New Mexico	1
Norway	1
South Carolina	1
Turkey	1
United Kingdom (London)	1
More ▼

Laws, Policies, & Programs

Elementary and Secondary…	3
Every Student Succeeds Act…	1
No Child Left Behind Act 2001	1

What Works Clearinghouse Rating

Showing 1 to 15 of 103 results Save | Export

The Sensitivity of Value-Added Estimates to Test Scoring Decisions. EdWorkingPaper No. 25-1226

Download full text

Joshua B. Gilbert; James G. Soland; Benjamin W. Domingue – Annenberg Institute for School Reform at Brown University, 2025

Value-Added Models (VAMs) are both common and controversial in education policy and accountability research. While the sensitivity of VAMs to model specification and covariate selection is well documented, the extent to which test scoring methods (e.g., mean scores vs. IRT-based scores) may affect VA estimates is less studied. We examine the…

Descriptors: Value Added Models, Tests, Testing, Scoring

Inter-Rater Reliability in Comprehensive Examination Scoring: The Case for Consistent and Collaborative Rater Training and Calibration

Download full text

Saenz, David Arron – Online Submission, 2023

There is a vast body of literature documenting the positive impacts that rater training and calibration sessions have on inter-rater reliability as research indicates several factors including frequency and timing play crucial roles towards ensuring inter-rater reliability. Additionally, increasing amounts research indicate possible links in…

Descriptors: Interrater Reliability, Scoring, Training, Scoring Rubrics

Investigating Constructed-Response Scoring over Time: The Effects of Study Design on Trend Rescore Statistics. Research Report. ETS RR-22-15

Peer reviewed
PDF on ERIC

Download full text

Donoghue, John R.; McClellan, Catherine A.; Hess, Melinda R. – ETS Research Report Series, 2022

When constructed-response items are administered for a second time, it is necessary to evaluate whether the current Time B administration's raters have drifted from the scoring of the original administration at Time A. To study this, Time A papers are sampled and rescored by Time B scorers. Commonly the scores are compared using the proportion of…

Descriptors: Item Response Theory, Test Construction, Scoring, Testing

Can the Oral Proficiency Interview -- Computer (ACTFL OPIc) Be Used Instead of the Oral Proficiency Interview (ACTFL OPI)? An Aligned Rank Transform (ART) Analysis

Peer reviewed

Direct link

Troy L. Cox; Gregory L. Thompson; Steven S. Stokes – Foreign Language Annals, 2025

This study investigated the differences between the ACTFL Oral Proficiency Interview (OPI) and the ACTFL Oral Proficiency Interview - Computer (OPIc) among Spanish learners at a U.S. university. Participants (N = 154) were randomly assigned to take both tests in a counterbalanced order to mitigate test order effects. Data were analyzed using an…

Descriptors: Oral Language, Language Proficiency, Interviews, Computer Uses in Education

Initial Evidence Supporting Interpretations of Scores from the Enhanced ACT Test. ACT Research. Research Report. R2425

Download full text

Jeff Allen; Ty Cruce – ACT Education Corp., 2025

This report summarizes some of the evidence supporting interpretations of scores from the enhanced ACT, focusing on reliability, concurrent validity, predictive validity, and score comparability. The authors argue that the evidence presented in this report supports the interpretation of scores from the enhanced ACT as measures of high school…

Descriptors: College Entrance Examinations, Testing, Change, Scores

Using Multilabel Neural Network to Score High-Dimensional Assessments for Different Use Foci: An Example with College Major Preference Assessment

Peer reviewed

Direct link

Shun-Fu Hu; Amery D. Wu; Jake Stone – Journal of Educational Measurement, 2025

Scoring high-dimensional assessments (e.g., > 15 traits) can be a challenging task. This paper introduces the multilabel neural network (MNN) as a scoring method for high-dimensional assessments. Additionally, it demonstrates how MNN can score the same test responses to maximize different performance metrics, such as accuracy, recall, or…

Descriptors: Tests, Testing, Scores, Test Construction

Reliability. Improving Literacy Brief: Understanding Screening

Direct link

Petscher, Y.; Pentimonti, J.; Stanley, C. – National Center on Improving Literacy, 2019

Reliability is the consistency of a set of scores that are designed to measure the same thing. Reliability is a statistical property of scores that must be demonstrated rather than assumed.

Descriptors: Scores, Measurement, Test Reliability, Error Patterns

Scoring Stability in a Large-Scale Assessment Program: A Longitudinal Analysis of Leniency/Severity Effects

Peer reviewed

Direct link

Palermo, Corey; Bunch, Michael B.; Ridge, Kirk – Journal of Educational Measurement, 2019

Although much attention has been given to rater effects in rater-mediated assessment contexts, little research has examined the overall stability of leniency and severity effects over time. This study examined longitudinal scoring data collected during three consecutive administrations of a large-scale, multi-state summative assessment program.…

Descriptors: Scoring, Interrater Reliability, Measurement, Summative Evaluation

Making Sense of Elementary School Reading Scores. Literacy Leadership Brief

Direct link

Fitzgerald, Jill; Shanahan, Timothy E. – International Literacy Association, 2020

Reading scores exist for a continuum of purposes, from informal assessment to formal standardized tests. This brief aims to answer the question: What matters most for elementary-grade teachers when thinking about reading scores, and what could policymakers do to help teachers? Three positions worth pursuing in this regard are shared: (1) every…

Descriptors: Reading Achievement, Scores, Elementary School Students, Elementary School Teachers

Simulation of LD Identification Accuracy Using a Pattern of Processing Strengths and Weaknesses Method with Multiple Measures

Peer reviewed

Direct link

Miciak, Jeremy; Taylor, W. Pat; Stuebing, Karla K.; Fletcher, Jack M. – Journal of Psychoeducational Assessment, 2018

We investigated the classification accuracy of learning disability (LD) identification methods premised on the identification of an intraindividual pattern of processing strengths and weaknesses (PSW) method using multiple indicators for all latent constructs. Known LD status was derived from latent scores; values at the observed level identified…

Descriptors: Accuracy, Learning Disabilities, Classification, Identification

Test Review: TestDaF

Peer reviewed

Direct link

Norris, John; Drackert, Anastasia – Language Testing, 2018

The Test of German as a Foreign Language (TestDaF) plays a critical role as a standardized test of German language proficiency. Developed and administered by the Society for Academic Study Preparation and Test Development (g.a.s.t.), TestDaF was launched in 2001 and has experienced persistent annual growth, with more than 44,000 test takers in…

Descriptors: German, Second Language Learning, Language Tests, Language Proficiency

ACT Reporting Category Interpretation Guide: Version 1.0. ACT Working Paper 2016 (05)

Download full text

Powers, Sonya; Li, Dongmei; Suh, Hongwook; Harris, Deborah J. – ACT, Inc., 2016

ACT reporting categories and ACT Readiness Ranges are new features added to the ACT score reports starting in fall 2016. For each reporting category, the number correct score, the maximum points possible, the percent correct, and the ACT Readiness Range, along with an indicator of whether the reporting category score falls within the Readiness…

Descriptors: Scores, Classification, College Entrance Examinations, Error of Measurement

Investigating Score Dependability in English/Chinese Interpreter Certification Performance Testing: A Generalizability Theory Approach

Peer reviewed

Direct link

Han, Chao – Language Assessment Quarterly, 2016

As a property of test scores, reliability/dependability constitutes an important psychometric consideration, and it underpins the validity of measurement results. A review of interpreter certification performance tests (ICPTs) reveals that (a) although reliability/dependability checking has been recognized as an important concern, its theoretical…

Descriptors: Foreign Countries, Scores, English, Chinese

Test Review: Wagner, R. K., Torgesen, J. K., Rashotte, C. A., & Pearson, N. A., "Comprehensive Test of Phonological Processing-2nd Ed. (CTOPP-2)." Austin, Texas: Pro-Ed

Peer reviewed

Direct link

Dickens, Rachel H.; Meisinger, Elizabeth B.; Tarar, Jessica M. – Canadian Journal of School Psychology, 2015

The Comprehensive Test of Phonological Processing-Second Edition (CTOPP-2; Wagner, Torgesen, Rashotte, & Pearson, 2013) is a norm-referenced test that measures phonological processing skills related to reading for individuals aged 4 to 24. According to its authors, the CTOPP-2 may be used to identify individuals who are markedly below their…

Descriptors: Norm Referenced Tests, Phonology, Test Format, Testing

Repeated Measurement of the Components of Attention with Young Children Using the Attention Network Test: Stability, Isolability, Robustness, and Reliability

Peer reviewed

Direct link

Ishigami, Yoko; Klein, Raymond M. – Journal of Cognition and Development, 2015

The current study examined the robustness, stability, reliability, and isolability of the attention network scores (alerting, orienting, and executive control) when young children experienced repeated administrations of the child version of the Attention Network Test (ANT; Rueda et al., 2004). Ten test sessions of the ANT were administered to 12…

Descriptors: Measurement, Attention, Scores, Executive Function

Previous Page | Next Page »

Pages: 1 | 2 | 3 | 4 | 5 | 6 | 7

Partnership for Assessment of…	4
Journal of Psychoeducational…	3
Language Assessment Quarterly	3
Language Testing	3
Assessment and Accountability…	2
Educational Measurement:…	2
International Journal of…	2
Journal of Educational…	2
New Meridian Corporation	2
ProQuest LLC	2
Psychometrika	2
ACT Education Corp.	1
ACT, Inc.	1
American Psychologist	1
Annenberg Institute for…	1
Asia Pacific Education Review	1
Assessing Writing	1
Assessment for Effective…	1
British Journal of…	1
Canadian Journal of School…	1
Clinical Linguistics &…	1
Diagnostique	1
Discourse: Studies in the…	1
ESL Magazine	1
ETS Research Report Series	1
More ▼

Gallas, Edwin J.	3
Goldschmidt, Pete	2
Heritage, Margaret	2
Herman, Joan L.	2
Kapes, Jerome T.	2
Allen, Thomas E.	1
Amery D. Wu	1
Anderson, Paul S.	1
Bachman, Lyle F.	1
Bailey, Janelle M.	1
Baker, Beverly A.	1
Bardo, John W.	1
Benjamin W. Domingue	1
Benners, G. Anthony	1
Bennett, Randy Elliot	1
Bergquist, Constance	1
Berkay, Paul	1
Botting, Nicola	1
Bunch, Michael B.	1
Byrd, E. Keith	1
Chance, Beth	1
Chen, Hui-Mei	1
Chu, Yiting	1
Clevinger, Amanda	1
More ▼

Higher Education	11
High Schools	8
Postsecondary Education	8
Secondary Education	8
Elementary Education	7
Grade 7	6
Middle Schools	6
Grade 4	5
Grade 5	5
Grade 6	5
Intermediate Grades	5
Junior High Schools	5
Early Childhood Education	4
Grade 3	4
Grade 8	4
Grade 9	4
Primary Education	4
Elementary Secondary Education	3
Grade 10	3
Grade 11	2
Adult Basic Education	1
Adult Education	1
Grade 12	1
High School Equivalency…	1
More ▼

ACT Assessment	2
ACTFL Oral Proficiency…	2
National Assessment of…	2
Stanford Achievement Tests	2
Test of English as a Foreign…	2
California Achievement Tests	1
Clinical Evaluation of…	1
Comprehensive Tests of Basic…	1
Florida Comprehensive…	1
General Aptitude Test Battery	1
General Educational…	1
International English…	1
Measures of Academic Progress	1
Metropolitan Achievement Tests	1
Peabody Picture Vocabulary…	1
Raven Progressive Matrices	1
SAT (College Admission Test)	1
Self Directed Search	1
Strengths and Difficulties…	1
Test of Adult Basic Education	1
Vineland Adaptive Behavior…	1
Wechsler Adult Intelligence…	1
Wechsler Memory Scale	1
Wide Range Achievement Test	1
Woodcock Johnson Tests of…	1
More ▼