ERIC - Search Results

Publication Date

In 2025	3
Since 2024	6
Since 2021 (last 5 years)	6
Since 2016 (last 10 years)	10
Since 2006 (last 20 years)	14

Descriptor

Error of Measurement	18
Evaluation Methods	18
Test Validity	18
Test Reliability	12
Student Evaluation	6
Test Bias	6
Evaluation Criteria	5
Comparative Testing	3
Foreign Countries	3
Item Analysis	3
Item Response Theory	3
Models	3
Sample Size	3
Test Construction	3
Testing Problems	3
Academic Standards	2
Alternative Assessment	2
Criterion Referenced Tests	2
Equated Scores	2
Hierarchical Linear Modeling	2
Measurement Techniques	2
National Competency Tests	2
Performance Based Assessment	2
Personnel Evaluation	2
Research Design	2
More ▼

Source

Educational Measurement:…	3
Journal of Educational…	3
Advances in Physiology…	1
Applied Measurement in…	1
ETS Research Institute	1
ETS Research Report Series	1
Field Methods	1
Grantee Submission	1
Social Indicators Research	1
Stanford Center for Education…	1
Teachers College Record	1
More ▼

Publication Type

Journal Articles	12
Reports - Research	10
Reports - Evaluative	5
Speeches/Meeting Papers	2
Information Analyses	1
Reports - Descriptive	1

Education Level

Higher Education	1
Junior High Schools	1
Middle Schools	1
Postsecondary Education	1
Secondary Education	1

Audience

Location

Australia	1
Ethiopia	1

Laws, Policies, & Programs

Assessments and Surveys

National Assessment of…

What Works Clearinghouse Rating

Showing 1 to 15 of 18 results Save | Export

IRT Observed-Score Equating for Rater-Mediated Assessments Using a Hierarchical Rater Model

Peer reviewed

Direct link

Tong Wu; Stella Y. Kim; Carl Westine; Michelle Boyer – Journal of Educational Measurement, 2025

While significant attention has been given to test equating to ensure score comparability, limited research has explored equating methods for rater-mediated assessments, where human raters inherently introduce error. If not properly addressed, these errors can undermine score interchangeability and test validity. This study proposes an equating…

Descriptors: Item Response Theory, Evaluators, Error of Measurement, Test Validity

Poverty and Wealth without a Ladder? An Appraisal of the Stages of Progress Method among Agro-Pastoralists in Ethiopia's Lower Omo Valley

Peer reviewed

Direct link

Edward G. J. Stevenson; Jil Molenaar; David-Paul Pertaub; Dessalegn Tekle – Field Methods, 2025

Is it possible to measure wealth and poverty across settings while being faithful to local understandings? The stages of progress method (SoP) attempts to do this by building ladders of wealth in locally relevant terms and using these in comparisons across groups. This approach is potentially useful among pastoralist populations where monetary…

Descriptors: Foreign Countries, Poverty, Social Mobility, Evaluation Methods

Evidence-Based Evaluation of Student and Marker Performances in Assessment and Examination

Peer reviewed

Direct link

Ole J. Kemi – Advances in Physiology Education, 2025

Students are assessed by coursework and/or exams, all of which are marked by assessors (markers). Student and marker performances are then subject to end-of-session board of examiner handling and analysis. This occurs annually and is the basis for evaluating students but also the wider learning and teaching efficiency of an academic institution.…

Descriptors: Undergraduate Students, Evaluation Methods, Evaluation Criteria, Academic Standards

Reframing Research and Assessment Practices: Advancing an Antiracist and Anti-Ableist Research Agenda

Peer reviewed

Direct link

Angela Johnson; Elizabeth Barker; Marcos Viveros Cespedes – Educational Measurement: Issues and Practice, 2024

Educators and researchers strive to build policies and practices on data and evidence, especially on academic achievement scores. When assessment scores are inaccurate for specific student populations or when scores are inappropriately used, even data-driven decisions will be misinformed. To maximize the impact of the research-practice-policy…

Descriptors: Equal Education, Inclusion, Evaluation Methods, Error of Measurement

Signal-to-Noise Ratio in Estimating and Testing the Mediation Effect: Structural Equation Modeling versus Path Analysis with Weighted Composites

Peer reviewed

Direct link

Ke-Hai Yuan; Zhiyong Zhang; Lijuan Wang – Grantee Submission, 2024

Mediation analysis plays an important role in understanding causal processes in social and behavioral sciences. While path analysis with composite scores was criticized to yield biased parameter estimates when variables contain measurement errors, recent literature has pointed out that the population values of parameters of latent-variable models…

Descriptors: Structural Equation Models, Path Analysis, Weighted Scores, Comparative Testing

Charting the Future of Assessments. Full Report

Download full text

Patrick C. Kyllonen; Amit Sevak; Teresa Ober; Ikkyu Choi; Jesse Sparks; Daniel Fishtein – ETS Research Institute, 2024

Assessment refers to a broad array of approaches for measuring or evaluating a person's (or group of persons') skills, behaviors, dispositions, or other attributes. Assessments range from standardized tests used in admissions, employee selection, licensure examinations, and domestic and international largescale assessments of cognitive and…

Descriptors: Performance Based Assessment, Evaluation Criteria, Evaluation Methods, Test Bias

Validation Methods for Aggregate-Level Test Scale Linking: A Case Study Mapping School District Test Score Distributions to a Common Scale. CEPA Working Paper No. 16-09

Download full text

Reardon, Sean F.; Ho, Andrew D.; Kalogrides, Demetra – Stanford Center for Education Policy Analysis, 2019

Linking score scales across different tests is considered speculative and fraught, even at the aggregate level (Feuer et al., 1999; Thissen, 2007). We introduce and illustrate validation methods for aggregate linkages, using the challenge of linking U.S. school district average test scores across states as a motivating example. We show that…

Descriptors: Test Validity, Evaluation Methods, School Districts, Scores

Maintaining Equivalent Cut Scores for Small Sample Test Forms

Peer reviewed

Direct link

Dwyer, Andrew C. – Journal of Educational Measurement, 2016

This study examines the effectiveness of three approaches for maintaining equivalent performance standards across test forms with small samples: (1) common-item equating, (2) resetting the standard, and (3) rescaling the standard. Rescaling the standard (i.e., applying common-item equating methodology to standard setting ratings to account for…

Descriptors: Cutting Scores, Equivalency Tests, Test Format, Academic Standards

The Accuracy of Aggregate Student Growth Percentiles as Indicators of Educator Performance

Peer reviewed

Direct link

Castellano, Katherine E.; McCaffrey, Daniel F. – Educational Measurement: Issues and Practice, 2017

Mean or median student growth percentiles (MGPs) are a popular measure of educator performance, but they lack rigorous evaluation. This study investigates the error in MGP due to test score measurement error (ME). Using analytic derivations, we find that errors in the commonly used MGP are correlated with average prior latent achievement: Teachers…

Descriptors: Teacher Evaluation, Teacher Effectiveness, Value Added Models, Achievement Gains

Exploring the Utility of Sequential Analysis in Studying Informal Formative Assessment Practices

Peer reviewed

Direct link

Furtak, Erin Marie; Ruiz-Primo, Maria Araceli; Bakeman, Roger – Educational Measurement: Issues and Practice, 2017

Formative assessment is a classroom practice that has received much attention in recent years for its established potential at increasing student learning. A frequent analytic approach for determining the quality of formative assessment practices is to develop a coding scheme and determine frequencies with which the codes are observed; however,…

Descriptors: Sequential Approach, Formative Evaluation, Alternative Assessment, Incidence

Impact of Design Effects in Large-Scale District and State Assessments

Peer reviewed

Direct link

Phillips, Gary W. – Applied Measurement in Education, 2015

This article proposes that sampling design effects have potentially huge unrecognized impacts on the results reported by large-scale district and state assessments in the United States. When design effects are unrecognized and unaccounted for they lead to underestimating the sampling error in item and test statistics. Underestimating the sampling…

Descriptors: State Programs, Sampling, Research Design, Error of Measurement

A Review of ETS Differential Item Functioning Assessment Procedures: Flagging Rules, Minimum Sample Size Requirements, and Criterion Refinement. Research Report. ETS RR-12-08

Peer reviewed
PDF on ERIC

Download full text

Zwick, Rebecca – ETS Research Report Series, 2012

Differential item functioning (DIF) analysis is a key component in the evaluation of the fairness and validity of educational tests. The goal of this project was to review the status of ETS DIF analysis procedures, focusing on three aspects: (a) the nature and stringency of the statistical rules used to flag items, (b) the minimum sample size…

Descriptors: Test Bias, Sample Size, Bayesian Statistics, Evaluation Methods

Reliability, Validity and True Values in Surveys

Peer reviewed

Direct link

Wikman, Anders – Social Indicators Research, 2006

In general, it is assumed that distinct true values will be found behind what is actually measured in surveys. By acquiring sufficient knowledge of measurement error, its extent and nature, we are supposed to be able to obtain adequate knowledge of underlying properties. It could be maintained, however, that this idea of a stable and…

Descriptors: Surveys, Test Validity, Responses, Error of Measurement

An Evaluation Model for Mastery Testing

Peer reviewed

Emrick, John A. – Journal of Educational Measurement, 1971

Descriptors: Criterion Referenced Tests, Error of Measurement, Evaluation Methods, Item Analysis

Psychometric Issues in the ELL Assessment and Special Education Eligibility

Peer reviewed

Direct link

Abedi, Jamal – Teachers College Record, 2006

Assessments in English that are constructed for native English speakers may not provide valid inferences about the achievement of English language learners (ELLs). The linguistic complexity of the test items that are not related to the content of the assessment may increase the measurement error, thus reducing the reliability of the assessment.…

Descriptors: Second Language Learning, Test Items, Psychometrics, Inferences

Previous Page | Next Page »

Pages: 1 | 2

Abedi, Jamal	1
Amit Sevak	1
Angela Johnson	1
Bakeman, Roger	1
Bateman, Andrea	1
Carl Westine	1
Cason, Gerald J.	1
Castellano, Katherine E.	1
Cheung, K. C.	1
Daniel Fishtein	1
David-Paul Pertaub	1
Dessalegn Tekle	1
Dwyer, Andrew C.	1
Edward G. J. Stevenson	1
Elizabeth Barker	1
Emrick, John A.	1
Furtak, Erin Marie	1
Gillis, Shelley	1
Ho, Andrew D.	1
Ikkyu Choi	1
Jesse Sparks	1
Jil Molenaar	1
Kalogrides, Demetra	1
Ke-Hai Yuan	1
More ▼