ERIC - Search Results

Publication Date

In 2025	3
Since 2024	3
Since 2021 (last 5 years)	12
Since 2016 (last 10 years)	30
Since 2006 (last 20 years)	39

Source

Educational Measurement:…

Publication Type

Journal Articles	54
Reports - Research	54
Information Analyses	2
Opinion Papers	1
Reports - Descriptive	1
Reports - Evaluative	1
Speeches/Meeting Papers	1
Tests/Questionnaires	1

Education Level

Secondary Education	10
Higher Education	8
Postsecondary Education	7
Junior High Schools	5
Middle Schools	5
Elementary Education	4
Elementary Secondary Education	3
High Schools	3
Early Childhood Education	2
Grade 4	2
Grade 9	2
Intermediate Grades	2
Grade 3	1
Grade 5	1
Grade 6	1
Grade 7	1
Grade 8	1
Preschool Education	1
Primary Education	1
More ▼

Audience

Teachers

Location

Canada	2
Germany	2
California	1
Greece	1
Kentucky	1
Texas	1

Laws, Policies, & Programs

Civil Rights Act 1964 Title…	1
Every Student Succeeds Act…	1
Fourteenth Amendment	1

Assessments and Surveys

SAT (College Admission Test)	3
ACT Assessment	1
Program for International…	1
Progress in International…	1
Watson Glaser Critical…	1

What Works Clearinghouse Rating

Showing 1 to 15 of 54 results Save | Export

Defining Test-Score Interpretation, Use, and Claims: Delphi Study for the Validity Argument

Peer reviewed

Direct link

Folger, Timothy D.; Bostic, Jonathan; Krupa, Erin E. – Educational Measurement: Issues and Practice, 2023

Validity is a fundamental consideration of test development and test evaluation. The purpose of this study is to define and reify three key aspects of validity and validation, namely test-score interpretation, test-score use, and the claims supporting interpretation and use. This study employed a Delphi methodology to explore how experts in…

Descriptors: Test Interpretation, Scores, Test Use, Test Validity

Applying a Mixture Rasch Model-Based Approach to Standard Setting

Peer reviewed

Direct link

Peabody, Michael R.; Muckle, Timothy J.; Meng, Yu – Educational Measurement: Issues and Practice, 2023

The subjective aspect of standard-setting is often criticized, yet data-driven standard-setting methods are rarely applied. Therefore, we applied a mixture Rasch model approach to setting performance standards across several testing programs of various sizes and compared the results to existing passing standards derived from traditional…

Descriptors: Item Response Theory, Standard Setting, Testing, Sampling

Psychometric Evaluation of the Preschool Early Numeracy Skills Test--Brief Version within the Item Response Theory Framework

Peer reviewed

Direct link

Tsigilis, Nikolaos; Krousorati, Katerina; Gregoriadis, Athanasios; Grammatikopoulos, Vasilis – Educational Measurement: Issues and Practice, 2023

The Preschool Early Numeracy Skills Test--Brief Version (PENS-B) is a measure of early numeracy skills, developed and mainly used in the United States. The purpose of this study was to examine the factorial validity and measurement invariance across gender of PENS-B in the Greek educational context. PENS-B was administered to 906 preschool…

Descriptors: Psychometrics, Preschool Education, Numeracy, Item Response Theory

Setting and Validating Multiple Standards on a Multistage-Adaptive Test

Peer reviewed

Direct link

Lewis, Jennifer; Lim, Hwanggyu; Padellaro, Frank; Sireci, Stephen G.; Zenisky, April L. – Educational Measurement: Issues and Practice, 2022

Setting cut scores on (MSTs) is difficult, particularly when the test spans several grade levels, and the selection of items from MST panels must reflect the operational test specifications. In this study, we describe, illustrate, and evaluate three methods for mapping panelists' Angoff ratings into cut scores on the scale underlying an MST. The…

Descriptors: Cutting Scores, Adaptive Testing, Test Items, Item Analysis

Demystifying Adequate Growth Percentiles

Peer reviewed

Direct link

Katherine E. Castellano; Daniel F. McCaffrey; Joseph A. Martineau – Educational Measurement: Issues and Practice, 2025

Growth-to-standard models evaluate student growth against the growth needed to reach a future standard or target of interest, such as proficiency. A common growth-to-standard model involves comparing the popular Student Growth Percentile (SGP) to Adequate Growth Percentiles (AGPs). AGPs follow from an involved process based on fitting a series of…

Descriptors: Student Evaluation, Growth Models, Student Educational Objectives, Educational Indicators

Instruction-Tuned Large-Language Models for Quality Control in Automatic Item Generation: A Feasibility Study

Peer reviewed

Direct link

Guher Gorgun; Okan Bulut – Educational Measurement: Issues and Practice, 2025

Automatic item generation may supply many items instantly and efficiently to assessment and learning environments. Yet, the evaluation of item quality persists to be a bottleneck for deploying generated items in learning and assessment settings. In this study, we investigated the utility of using large-language models, specifically Llama 3-8B, for…

Descriptors: Artificial Intelligence, Quality Control, Technology Uses in Education, Automation

Validation as Evaluating Desired and Undesired Effects: Insights from Cross-Classified Mixed Effects Model

Peer reviewed

Direct link

Ji, Xuejun Ryan; Wu, Amery D. – Educational Measurement: Issues and Practice, 2023

The Cross-Classified Mixed Effects Model (CCMEM) has been demonstrated to be a flexible framework for evaluating reliability by measurement specialists. Reliability can be estimated based on the variance components of the test scores. Built upon their accomplishment, this study extends the CCMEM to be used for evaluating validity evidence.…

Descriptors: Measurement, Validity, Reliability, Models

Measuring Comparison Effects: A Critical View on the Internal/External Frame of Reference Model

Peer reviewed

Direct link

Wolff, Fabian – Educational Measurement: Issues and Practice, 2021

The internal/external frame of reference (I/E) model describes the formation of students' math and verbal self-concepts by the joint effects of social comparisons (where students compare their subject-specific achievements with those of their classmates) and dimensional comparisons (where students compare their math and verbal achievements with…

Descriptors: Self Concept, Concept Formation, Mathematics Achievement, Verbal Ability

The Effects of Inattentive Responding on Construct Validity Evidence When Measuring Social-Emotional Learning Competencies

Peer reviewed

Direct link

Steedle, Jeffrey T.; Hong, Maxwell; Cheng, Ying – Educational Measurement: Issues and Practice, 2019

Self-report inventories are commonly administered to measure social-emotional learning competencies related to college and career readiness. Inattentive responding can negatively impact the validity of interpreting individual results and the accuracy of construct validity evidence. This study applied nine methods of detecting insufficient effort…

Descriptors: Construct Validity, Social Development, Emotional Development, College Readiness

Changing Educational Assessments in the Post-COVID-19 Era: From Assessment of Learning (AoL) to Assessment as Learning (AaL)

Peer reviewed

Direct link

Yang, Li-Ping; Xin, Tao – Educational Measurement: Issues and Practice, 2022

The upgrade educational information technology triggered by COVID-19 has shaped a new educational order and new educational forms. As a result, traditional educational measurement is now facing a systematic transformation, that is, from the Assessment of Learning (AoL) to Assessment for Learning (AfL), and finally to Assessment as Learning (AaL).…

Descriptors: Educational Assessment, Information Technology, Educational Technology, COVID-19

Growth across Grades and Common Item Grade Alignment in Vertical Scaling Using the Rasch Model

Peer reviewed

Direct link

Sanford R. Student; Derek C. Briggs; Laurie Davis – Educational Measurement: Issues and Practice, 2025

Vertical scales are frequently developed using common item nonequivalent group linking. In this design, one can use upper-grade, lower-grade, or mixed-grade common items to estimate the linking constants that underlie the absolute measurement of growth. Using the Rasch model and a dataset from Curriculum Associates' i-Ready Diagnostic in math in…

Descriptors: Elementary School Mathematics, Elementary School Students, Middle School Mathematics, Middle School Students

Disrupted Data: Using Longitudinal Assessment Systems to Monitor Test Score Quality

Peer reviewed

Direct link

An, Lily Shiao; Ho, Andrew Dean; Davis, Laurie Laughlin – Educational Measurement: Issues and Practice, 2022

Technical documentation for educational tests focuses primarily on properties of individual scores at single points in time. Reliability, standard errors of measurement, item parameter estimates, fit statistics, and linking constants are standard technical features that external stakeholders use to evaluate items and individual scale scores.…

Descriptors: Documentation, Scores, Evaluation Methods, Longitudinal Studies

Using the "Joint Standards" to Design Postsecondary Assessments with Evidence of Validity and Reliability: An Approach to CAEP Accreditation

Peer reviewed

Direct link

Wilkerson, Judy R. – Educational Measurement: Issues and Practice, 2020

Validity and reliability are a major focus in teacher education accreditation by the Council for Accreditation of Educator Preparation (CAEP). CAEP requires the use of "accepted research standards," but many faculty and administrators are unsure how to meet this requirement. The Standards of Educational and Psychological Testing…

Descriptors: Test Construction, Test Validity, Test Reliability, Teacher Education Programs

The Effect of Drag-and-Drop Item Features on Test-Taker Performance and Response Strategies

Peer reviewed

Direct link

Arslan, Burcu; Jiang, Yang; Keehner, Madeleine; Gong, Tao; Katz, Irvin R.; Yan, Fred – Educational Measurement: Issues and Practice, 2020

Computer-based educational assessments often include items that involve drag-and-drop responses. There are different ways that drag-and-drop items can be laid out and different choices that test developers can make when designing these items. Currently, these decisions are based on experts' professional judgments and design constraints, rather…

Descriptors: Test Items, Computer Assisted Testing, Test Format, Decision Making

The Relationship between Item Developer Alignment of Items to Range Achievement-Level Descriptors and Item Difficulty: Implications for Validating Intended Score Interpretations

Peer reviewed

Direct link

Schneider, M. Christina; Agrimson, Jared; Veazey, Mary – Educational Measurement: Issues and Practice, 2022

This paper presents results of a score interpretation study for a computer adaptive mathematics assessment. The study purpose was to test the efficacy of item developers' alignment of items to Range Achievement-Level Descriptors (RALDs; Egan et al.) against the empirical achievement-level alignment of items to investigate the use of RALDs as the…

Descriptors: Computer Assisted Testing, Mathematics Tests, Scores, Grade 3

Previous Page | Next Page »

Pages: 1 | 2 | 3 | 4

Kuncel, Nathan R.	4
Sackett, Paul R.	3
Abedi, Jamal	2
Kostal, Jack W.	2
Linn, Robert L.	2
McCaffrey, Daniel F.	2
Moss, Pamela A.	2
Agrimson, Jared	1
An, Lily Shiao	1
Arslan, Burcu	1
Bakeman, Roger	1
Bandalos, Deborah L.	1
Beatty, Adam S.	1
Bostic, Jonathan	1
Buerger, Sarah	1
Burkett, Ruth S.	1
Buzick, Heather	1
Carter, Kathy	1
Castellano, Katherine E.	1
Cheng, Ying	1
Clauser, Brian E.	1
Clauser, Jerome C.	1
Crawford, Angela	1
Cui, Ying	1
More ▼

Test Validity	27
Validity	19
Test Construction	12
Test Items	12
Evaluation Methods	11
Scores	11
Test Reliability	9
Elementary Secondary Education	8
Achievement Tests	7
Computer Assisted Testing	7
Test Interpretation	7
Academic Achievement	6
College Entrance Examinations	6
Foreign Countries	6
Predictive Validity	6
Item Analysis	5
Predictor Variables	5
Student Evaluation	5
Test Use	5
Testing Problems	5
Accountability	4
Achievement Gains	4
Cutting Scores	4
Decision Making	4
Error of Measurement	4
More ▼