ERIC - Search Results

Publication Date

In 2025	1
Since 2024	2
Since 2021 (last 5 years)	8
Since 2016 (last 10 years)	17
Since 2006 (last 20 years)	23

Descriptor

Test Validity	67
Testing Problems	31
Test Construction	19
Test Use	19
Elementary Secondary Education	18
Computer Assisted Testing	14
Educational Testing	14
Test Interpretation	12
Testing Programs	12
Achievement Tests	11
Minimum Competency Testing	10
Standardized Tests	10
Educational Assessment	9
State Programs	9
Test Items	9
Evaluation Methods	8
Psychometrics	8
Standards	8
Court Litigation	7
Licensing Examinations…	7
Test Reliability	7
Norm Referenced Tests	6
Criterion Referenced Tests	5
Item Analysis	5
Measurement Objectives	5
More ▼

Source

Educational Measurement:…

Publication Type

Journal Articles	67
Opinion Papers	24
Reports - Evaluative	17
Reports - Descriptive	16
Reports - Research	14
Information Analyses	7
Speeches/Meeting Papers	2
Guides - Non-Classroom	1

Education Level

Secondary Education	3
Elementary Secondary Education	2
Junior High Schools	2
Middle Schools	2
Early Childhood Education	1
Elementary Education	1
Grade 3	1
Grade 4	1
Grade 5	1
Grade 6	1
Grade 7	1
Grade 8	1
Grade 9	1
High Schools	1
Higher Education	1
Intermediate Grades	1
Postsecondary Education	1
Primary Education	1
More ▼

Audience

Researchers

Location

Idaho	1
Kansas	1
Texas	1
Vermont	1

Laws, Policies, & Programs

Debra P v Turlington	4
Civil Rights Act 1964 Title…	1
No Child Left Behind Act 2001	1

Assessments and Surveys

Florida State Student…	3
National Teacher Examinations	2
National Assessment of…	1
Program for International…	1
Stanford Achievement Tests	1
Teacher Performance…	1
Wechsler Intelligence Scale…	1

What Works Clearinghouse Rating

Showing 1 to 15 of 67 results Save | Export

Clarifying the Terminology of Validity and the Investigative Stages of Validation

Peer reviewed

Direct link

Russell, Michael – Educational Measurement: Issues and Practice, 2022

Despite agreement about the central importance of validity for educational and psychological testing, consensus regarding the definition of validity remains elusive. Differences in the definition of validity are examined and reveals that a potential cause of disagreement stems from differences in word use and meanings given to key terms commonly…

Descriptors: Test Validity, Psychological Testing, Educational Testing, Vocabulary

An Examination of Classification Accuracy in the Continuous Testing Framework

Peer reviewed

Direct link

Coggeshall, Whitney Smiley – Educational Measurement: Issues and Practice, 2021

The continuous testing framework, where both successful and unsuccessful examinees have to demonstrate continued proficiency at frequent prespecified intervals, is a framework that is used in noncognitive assessment and is gaining in popularity in cognitive assessment. Despite the rigorous advantages of this framework, this paper demonstrates that…

Descriptors: Classification, Accuracy, Testing, Failure

Digital Module 30: Validity and Educational Testing--Purposes and Uses of Educational Tests

Peer reviewed

Direct link

Lewis, Jennifer; Sireci, Stephen G. – Educational Measurement: Issues and Practice, 2022

This module is designed for educators, educational researchers, and psychometricians who would like to develop an understanding of the basic concepts of validity theory, test validation, and documenting a "validity argument." It also describes how an in-depth understanding of the purposes and uses of educational tests sets the foundation…

Descriptors: Test Validity, Tests, Testing Problems, Faculty Development

Applying a Mixture Rasch Model-Based Approach to Standard Setting

Peer reviewed

Direct link

Peabody, Michael R.; Muckle, Timothy J.; Meng, Yu – Educational Measurement: Issues and Practice, 2023

The subjective aspect of standard-setting is often criticized, yet data-driven standard-setting methods are rarely applied. Therefore, we applied a mixture Rasch model approach to setting performance standards across several testing programs of various sizes and compared the results to existing passing standards derived from traditional…

Descriptors: Item Response Theory, Standard Setting, Testing, Sampling

Instruction-Tuned Large-Language Models for Quality Control in Automatic Item Generation: A Feasibility Study

Peer reviewed

Direct link

Guher Gorgun; Okan Bulut – Educational Measurement: Issues and Practice, 2025

Automatic item generation may supply many items instantly and efficiently to assessment and learning environments. Yet, the evaluation of item quality persists to be a bottleneck for deploying generated items in learning and assessment settings. In this study, we investigated the utility of using large-language models, specifically Llama 3-8B, for…

Descriptors: Artificial Intelligence, Quality Control, Technology Uses in Education, Automation

Deficiency, Contamination, and the Signal Processing Metaphor

Peer reviewed

Direct link

Newton, Paul E. – Educational Measurement: Issues and Practice, 2020

Educational assessment involves eliciting, transmitting, and receiving information concerning the level of proficiency of a learner in a specified domain. With that in mind, it is perhaps surprising that the literature seems to make very little use of the signal processing metaphor. The present article begins by making a general case for greater…

Descriptors: Educational Assessment, Student Evaluation, Evaluative Thinking, Test Validity

The Impact of the COVID-19 Pandemic on American Board of Surgery's Oral Certifying Exams

Peer reviewed

Direct link

Barry, Carol L.; Jones, Andrew T.; Ibáñez, Beatriz; Grambau, Marni; Buyske, Jo – Educational Measurement: Issues and Practice, 2022

In response to the COVID-19 pandemic, the American Board of Surgery (ABS) shifted from in-person to remote administrations of the oral certifying exam (CE). Although the overall exam architecture remains the same, there are a number of differences in administration and staffing costs, exam content, security concerns, and the tools used to give the…

Descriptors: COVID-19, Pandemics, Computer Assisted Testing, Verbal Tests

An Evaluative Framework for Reviewing Fairness Standards and Practices in Educational Tests

Peer reviewed

Direct link

Jonson, Jessica L.; Trantham, Pamela; Usher-Tate, Betty Jean – Educational Measurement: Issues and Practice, 2019

One of the substantive changes in the 2014 Standards for Educational and Psychological Testing was the elevation of fairness in testing as a foundational element of practice in addition to validity and reliability. Previous research indicates that testing practices often do not align with professional standards and guidelines. Therefore, to raise…

Descriptors: Culture Fair Tests, Test Validity, Test Reliability, Intelligence Tests

Reframing Research and Assessment Practices: Advancing an Antiracist and Anti-Ableist Research Agenda

Peer reviewed

Direct link

Angela Johnson; Elizabeth Barker; Marcos Viveros Cespedes – Educational Measurement: Issues and Practice, 2024

Educators and researchers strive to build policies and practices on data and evidence, especially on academic achievement scores. When assessment scores are inaccurate for specific student populations or when scores are inappropriately used, even data-driven decisions will be misinformed. To maximize the impact of the research-practice-policy…

Descriptors: Equal Education, Inclusion, Evaluation Methods, Error of Measurement

Multistage Adaptive Testing Design in International Large-Scale Assessments

Peer reviewed

Direct link

Yamamoto, Kentaro; Shin, Hyo Jeong; Khorramdel, Lale – Educational Measurement: Issues and Practice, 2018

A multistage adaptive testing (MST) design was implemented for the Programme for the International Assessment of Adult Competencies (PIAAC) starting in 2012 for about 40 countries and has been implemented for the 2018 cycle of the Programme for International Student Assessment (PISA) for more than 80 countries. Using examples from PISA and PIAAC,…

Descriptors: International Assessment, Foreign Countries, Achievement Tests, Test Validity

Examining Effectiveness and Validity of Accommodations for English Language Learners in Mathematics: An Evidence-Based Computer Accommodation Decision System

Peer reviewed

Direct link

Abedi, Jamal; Zhang, Yu; Rowe, Susan E.; Lee, Hansol – Educational Measurement: Issues and Practice, 2020

Research indicates that the performance-gap between English Language Learners (ELLs) and their non-ELL peers is partly due to ELLs' difficulty in understanding assessment language. Accommodations have been shown to narrow this performance-gap, but many accommodations studies have not used a randomized design and are based on relatively small…

Descriptors: English Language Learners, Achievement Gap, Mathematics Tests, Standards

Digital Module 18: Automated Scoring

Peer reviewed

Direct link

Lottridge, Sue; Burkhardt, Amy; Boyer, Michelle – Educational Measurement: Issues and Practice, 2020

In this digital ITEMS module, Dr. Sue Lottridge, Amy Burkhardt, and Dr. Michelle Boyer provide an overview of automated scoring. Automated scoring is the use of computer algorithms to score unconstrained open-ended test items by mimicking human scoring. The use of automated scoring is increasing in educational assessment programs because it allows…

Descriptors: Computer Assisted Testing, Scoring, Automation, Educational Assessment

The Effect of Drag-and-Drop Item Features on Test-Taker Performance and Response Strategies

Peer reviewed

Direct link

Arslan, Burcu; Jiang, Yang; Keehner, Madeleine; Gong, Tao; Katz, Irvin R.; Yan, Fred – Educational Measurement: Issues and Practice, 2020

Computer-based educational assessments often include items that involve drag-and-drop responses. There are different ways that drag-and-drop items can be laid out and different choices that test developers can make when designing these items. Currently, these decisions are based on experts' professional judgments and design constraints, rather…

Descriptors: Test Items, Computer Assisted Testing, Test Format, Decision Making

The Relationship between Item Developer Alignment of Items to Range Achievement-Level Descriptors and Item Difficulty: Implications for Validating Intended Score Interpretations

Peer reviewed

Direct link

Schneider, M. Christina; Agrimson, Jared; Veazey, Mary – Educational Measurement: Issues and Practice, 2022

This paper presents results of a score interpretation study for a computer adaptive mathematics assessment. The study purpose was to test the efficacy of item developers' alignment of items to Range Achievement-Level Descriptors (RALDs; Egan et al.) against the empirical achievement-level alignment of items to investigate the use of RALDs as the…

Descriptors: Computer Assisted Testing, Mathematics Tests, Scores, Grade 3

Digital Module 09: Sociocognitive Assessment for Diverse Populations

Peer reviewed

Direct link

Mislevy, Robert J.; Oliveri, Maria Elena – Educational Measurement: Issues and Practice, 2019

In this digital ITEMS module, Dr. Robert [Bob] Mislevy and Dr. Maria Elena Oliveri introduce and illustrate a sociocognitive perspective on educational measurement, which focuses on a variety of design and implementation considerations for creating fair and valid assessments for learners from diverse populations with diverse sociocultural…

Descriptors: Educational Testing, Reliability, Test Validity, Test Reliability

Previous Page | Next Page »

Pages: 1 | 2 | 3 | 4 | 5

Linn, Robert L.	3
Mehrens, William A.	3
Madaus, George F.	2
Popham, W. James	2
Rudner, Lawrence M.	2
Shepard, Lorrie A.	2
Sireci, Stephen G.	2
Wise, Steven L.	2
Abedi, Jamal	1
Agrimson, Jared	1
Angela Johnson	1
Arslan, Burcu	1
Barry, Carol L.	1
Bhola, Dennison S.	1
Bond, Lloyd	1
Bottsford-Miller, Nicole A.	1
Boyer, Michelle	1
Burkhardt, Amy	1
Buyske, Jo	1
Buzick, Heather	1
Capie, William	1
Carter, Kathy	1
Citron, Christiane H.	1
Cizek, Gregory J.	1
More ▼