ERIC - Search Results

Publication Date

In 2025	0
Since 2024	1
Since 2021 (last 5 years)	2
Since 2016 (last 10 years)	3
Since 2006 (last 20 years)	22

Descriptor

Educational Testing	40
Error of Measurement	40
Scores	13
Correlation	9
Item Response Theory	8
Measurement Techniques	8
Models	8
Statistical Analysis	8
Achievement Tests	7
Test Items	7
Test Reliability	7
Academic Achievement	6
Computation	6
Educational Assessment	6
Evaluation Methods	6
Measurement	6
Secondary Education	6
Simulation	6
Educational Policy	5
Effect Size	5
Longitudinal Studies	5
Sampling	5
Teacher Effectiveness	5
Teacher Evaluation	5
Test Theory	5
More ▼

Publication Type

Journal Articles	24
Reports - Research	17
Reports - Evaluative	13
Speeches/Meeting Papers	5
Information Analyses	3
Reports - Descriptive	3
Numerical/Quantitative Data	2
Opinion Papers	2
Dissertations/Theses -…	1
ERIC Digests in Full Text	1
ERIC Publications	1
More ▼

Education Level

Elementary Secondary Education	7
Elementary Education	3
Grade 4	3
Grade 3	2
Grade 5	2
Grade 8	2
Secondary Education	2
Grade 6	1
Grade 7	1
Higher Education	1
Intermediate Grades	1
Junior High Schools	1
Middle Schools	1
Postsecondary Education	1
More ▼

Audience

Researchers

Location

California	3
New York	3
Arizona	1
Germany	1
Illinois	1
Ireland	1
Missouri	1
New Jersey	1
North Carolina	1
Tennessee	1
Texas	1
United Kingdom (England)	1
United Kingdom (Northern…	1
More ▼

Laws, Policies, & Programs

No Child Left Behind Act 2001

Assessments and Surveys

National Assessment of…	2
ACT Assessment	1
Iowa Tests of Basic Skills	1
Measures of Academic Progress	1
Sequential Tests of…	1
Stanford Achievement Tests	1

What Works Clearinghouse Rating

Showing 1 to 15 of 40 results Save | Export

Resolving and Re-Scoring Constructed Response Items in Mixed-Format Assessments: An Exploration of Three Approaches

Peer reviewed

Direct link

Stefanie A. Wind; Yangmeng Xu – Educational Assessment, 2024

We explored three approaches to resolving or re-scoring constructed-response items in mixed-format assessments: rater agreement, person fit, and targeted double scoring (TDS). We used a simulation study to consider how the three approaches impact the psychometric properties of student achievement estimates, with an emphasis on person fit. We found…

Descriptors: Interrater Reliability, Error of Measurement, Evaluation Methods, Examiners

Performance of Person-Fit Statistics under Model Misspecification

Peer reviewed

Direct link

Hong, Seong Eun; Monroe, Scott; Falk, Carl F. – Journal of Educational Measurement, 2020

In educational and psychological measurement, a person-fit statistic (PFS) is designed to identify aberrant response patterns. For parametric PFSs, valid inference depends on several assumptions, one of which is that the item response theory (IRT) model is correctly specified. Previous studies have used empirical data sets to explore the effects…

Descriptors: Educational Testing, Psychological Testing, Goodness of Fit, Error of Measurement

Validation Methods for Aggregate-Level Test Scale Linking: A Case Study Mapping School District Test Score Distributions to a Common Scale

Peer reviewed
PDF on ERIC

Download full text

Direct link

Reardon, Sean F.; Kalogrides, Demetra; Ho, Andrew D. – Journal of Educational and Behavioral Statistics, 2021

Linking score scales across different tests is considered speculative and fraught, even at the aggregate level. We introduce and illustrate validation methods for aggregate linkages, using the challenge of linking U.S. school district average test scores across states as a motivating example. We show that aggregate linkages can be validated both…

Descriptors: Equated Scores, Validity, Methods, School Districts

Differential Item Functioning Detection with the Mantel-Haenszel Procedure: The Effects of Matching Types and Other Factors

Peer reviewed

Direct link

Socha, Alan; DeMars, Christine E.; Zilberberg, Anna; Phan, Ha – International Journal of Testing, 2015

The Mantel-Haenszel (MH) procedure is commonly used to detect items that function differentially for groups of examinees from various demographic and linguistic backgrounds--for example, in international assessments. As in some other DIF methods, the total score is used to match examinees on ability. In thin matching, each of the total score…

Descriptors: Test Items, Educational Testing, Evaluation Methods, Ability Grouping

A Comparison of Three Methods for Computing Scale Score Conditional Standard Errors of Measurement. ACT Research Report Series, 2013 (7)

Download full text

Woodruff, David; Traynor, Anne; Cui, Zhongmin; Fang, Yu – ACT, Inc., 2013

Professional standards for educational testing recommend that both the overall standard error of measurement and the conditional standard error of measurement (CSEM) be computed on the score scale used to report scores to examinees. Several methods have been developed to compute scale score CSEMs. This paper compares three methods, based on…

Descriptors: Comparative Analysis, Error of Measurement, Scores, Scaling

Effect of Violating Unidimensional Item Response Theory Vertical Scaling Assumptions on Developmental Score Scales

Direct link

Topczewski, Anna Marie – ProQuest LLC, 2013

Developmental score scales represent the performance of students along a continuum, where as students learn more they move higher along that continuum. Unidimensional item response theory (UIRT) vertical scaling has become a commonly used method to create developmental score scales. Research has shown that UIRT vertical scaling methods can be…

Descriptors: Item Response Theory, Scaling, Scores, Student Development

How Unstable Are "School Effects" Assessed by a Value-Added Technique?

Peer reviewed
PDF on ERIC

Download full text

Gorad, Stephen; Hordosy, Rita; Siddiqui, Nadia – International Education Studies, 2013

This paper re-considers the widespread use of value-added approaches to estimate school "effects", and shows the results to be very unstable over time. The paper uses as an example the contextualised value-added scores of all secondary schools in England. The study asks how many schools with at least 99% of their pupils included in the…

Descriptors: Foreign Countries, Outcomes of Education, Secondary Education, Educational Testing

A Review of ETS Differential Item Functioning Assessment Procedures: Flagging Rules, Minimum Sample Size Requirements, and Criterion Refinement. Research Report. ETS RR-12-08

Peer reviewed
PDF on ERIC

Download full text

Zwick, Rebecca – ETS Research Report Series, 2012

Differential item functioning (DIF) analysis is a key component in the evaluation of the fairness and validity of educational tests. The goal of this project was to review the status of ETS DIF analysis procedures, focusing on three aspects: (a) the nature and stringency of the statistical rules used to flag items, (b) the minimum sample size…

Descriptors: Test Bias, Sample Size, Bayesian Statistics, Evaluation Methods

How Stable Are Value-Added Estimates across Years, Subjects and Student Groups? What We Know Series: Value-Added Methods and Applications. Knowledge Brief 3

Download full text

Loeb, Susanna; Candelaria, Christopher A. – Carnegie Foundation for the Advancement of Teaching, 2012

Value-added models measure teacher performance by the test score gains of their students, adjusted for a variety of factors such as the performance of students when they enter the class. The measures are based on desired student outcomes such as math and reading scores, but they have a number of potential drawbacks. One of them is the…

Descriptors: Academic Achievement, Teacher Effectiveness, Scores, Peer Influence

Fixing the c Parameter in the Three-Parameter Logistic Model

Peer reviewed
PDF on ERIC

Download full text

Han, Kyung T. – Practical Assessment, Research & Evaluation, 2012

For several decades, the "three-parameter logistic model" (3PLM) has been the dominant choice for practitioners in the field of educational measurement for modeling examinees' response data from multiple-choice (MC) items. Past studies, however, have pointed out that the c-parameter of 3PLM should not be interpreted as a guessing…

Descriptors: Statistical Analysis, Models, Multiple Choice Tests, Guessing (Tests)

Measuring Test Measurement Error: A General Approach

Peer reviewed

Direct link

Boyd, Donald; Lankford, Hamilton; Loeb, Susanna; Wyckoff, James – Journal of Educational and Behavioral Statistics, 2013

Test-based accountability as well as value-added asessments and much experimental and quasi-experimental research in education rely on achievement tests to measure student skills and knowledge. Yet, we know little regarding fundamental properties of these tests, an important example being the extent of measurement error and its implications for…

Descriptors: Accountability, Educational Research, Educational Testing, Error of Measurement

Unraveling Reliability

Peer reviewed

Direct link

Popham, W. James – Educational Leadership, 2009

If a person were to ask an educator to identify the two most important attributes of an education test, the response most certainly would be "validity and reliability." These two tightly wedded concepts have become icons in the field of education assessment. As far as validity is concerned, the term doesn't refer to the accuracy of a test. Rather,…

Descriptors: Educational Testing, Educational Assessment, Student Evaluation, Test Reliability

Online Calibration via Variable Length Computerized Adaptive Testing

Peer reviewed

Direct link

Chang, Yuan-chin Ivan; Lu, Hung-Yi – Psychometrika, 2010

Item calibration is an essential issue in modern item response theory based psychological or educational testing. Due to the popularity of computerized adaptive testing, methods to efficiently calibrate new items have become more important than that in the time when paper and pencil test administration is the norm. There are many calibration…

Descriptors: Test Items, Educational Testing, Adaptive Testing, Measurement

Estimating the Impacts of Educational Interventions Using State Tests or Study-Administered Tests. NCEE 2012-4016

Peer reviewed
PDF on ERIC

Download full text

Olsen, Robert B.; Unlu, Fatih; Price, Cristofer; Jaciw, Andrew P. – National Center for Education Evaluation and Regional Assistance, 2011

This report examines the differences in impact estimates and standard errors that arise when these are derived using state achievement tests only (as pre-tests and post-tests), study-administered tests only, or some combination of state- and study-administered tests. State tests may yield different evaluation results relative to a test that is…

Descriptors: Achievement Tests, Standardized Tests, State Standards, Reading Achievement

When Can Subscores Have Value?

Peer reviewed

Direct link

Haberman, Shelby J. – Journal of Educational and Behavioral Statistics, 2008

In educational tests, subscores are often generated from a portion of the items in a larger test. Guidelines based on mean squared error are proposed to indicate whether subscores are worth reporting. Alternatives considered are direct reports of subscores, estimates of subscores based on total score, combined estimates based on subscores and…

Descriptors: Testing Programs, Regression (Statistics), Scores, Student Evaluation

Previous Page | Next Page »

Pages: 1 | 2 | 3

Journal of Educational and…	4
Sociology of Education	3
Applied Psychological…	2
Educational and Psychological…	2
National Center for Analysis…	2
ACT, Inc.	1
American Educational Research…	1
American Institutes for…	1
Assessment in Education…	1
Carnegie Foundation for the…	1
ETS Research Report Series	1
Educational Assessment	1
Educational Leadership	1
Educational Measurement:…	1
International Education…	1
International Journal of…	1
Journal of Educational…	1
Journal of Experimental…	1
Journal of Teacher Education	1
National Center for Education…	1
Policy Analysis for…	1
Practical Assessment,…	1
ProQuest LLC	1
Psychological Assessment	1
Psychometrika	1
More ▼

Loeb, Susanna	3
Boyd, Donald	2
Lankford, Hamilton	2
Wyckoff, James	2
Brink, Nicholas E.	1
Campbell, Richard T.	1
Candelaria, Christopher A.	1
Chang, Yuan-chin Ivan	1
Cowan, Pamela	1
Cui, Zhongmin	1
DeMars, Christine E.	1
Demetrulias, Diana Mayer	1
Doss, David	1
Dwyer, Carol Anne	1
Falk, Carl F.	1
Fang, Yu	1
Feldt, Leonard S.	1
Gallagher, Larry	1
Gardner, John	1
Gilmer, Jerry S.	1
Gorad, Stephen	1
Grossman, Pamela	1
Haberman, Shelby J.	1
Han, Kyung T.	1
More ▼