ERIC - Search Results

Publication Date

In 2025	0
Since 2024	1
Since 2021 (last 5 years)	2
Since 2016 (last 10 years)	3
Since 2006 (last 20 years)	15

Descriptor

Educational Testing	24
Error of Measurement	24
Achievement Tests	6
Correlation	6
Item Response Theory	6
Models	6
Scores	6
Statistical Analysis	6
Evaluation Methods	5
Computation	4
Secondary Education	4
Test Items	4
Test Reliability	4
Ability Identification	3
Academic Ability	3
Academic Achievement	3
Academic Aspiration	3
Class Rank	3
Employment Level	3
Family Influence	3
Foreign Countries	3
Goodness of Fit	3
High Stakes Tests	3
Psychological Testing	3
Reading Tests	3
More ▼

Source

Journal of Educational and…	4
Sociology of Education	3
Applied Psychological…	2
American Educational Research…	1
Assessment in Education…	1
ETS Research Report Series	1
Educational Assessment	1
Educational Leadership	1
Educational Measurement:…	1
Educational and Psychological…	1
International Education…	1
International Journal of…	1
Journal of Educational…	1
Journal of Experimental…	1
Journal of Teacher Education	1
Practical Assessment,…	1
Psychological Assessment	1
Psychometrika	1
More ▼

Publication Type

Journal Articles	24
Reports - Research	12
Reports - Evaluative	6
Information Analyses	3
Reports - Descriptive	3
Opinion Papers	1
Speeches/Meeting Papers	1

Education Level

Elementary Education	3
Grade 4	3
Grade 3	2
Grade 5	2
Grade 8	2
Secondary Education	2
Elementary Secondary Education	1
Grade 6	1
Grade 7	1
Intermediate Grades	1
Junior High Schools	1
Middle Schools	1
More ▼

Audience

Location

Germany	1
Ireland	1
New York	1
United Kingdom (England)	1
United Kingdom (Northern…	1

Laws, Policies, & Programs

Assessments and Surveys

Iowa Tests of Basic Skills	1
Measures of Academic Progress	1
National Assessment of…	1
Stanford Achievement Tests	1

What Works Clearinghouse Rating

Showing 1 to 15 of 24 results Save | Export

Resolving and Re-Scoring Constructed Response Items in Mixed-Format Assessments: An Exploration of Three Approaches

Peer reviewed

Direct link

Stefanie A. Wind; Yangmeng Xu – Educational Assessment, 2024

We explored three approaches to resolving or re-scoring constructed-response items in mixed-format assessments: rater agreement, person fit, and targeted double scoring (TDS). We used a simulation study to consider how the three approaches impact the psychometric properties of student achievement estimates, with an emphasis on person fit. We found…

Descriptors: Interrater Reliability, Error of Measurement, Evaluation Methods, Examiners

Performance of Person-Fit Statistics under Model Misspecification

Peer reviewed

Direct link

Hong, Seong Eun; Monroe, Scott; Falk, Carl F. – Journal of Educational Measurement, 2020

In educational and psychological measurement, a person-fit statistic (PFS) is designed to identify aberrant response patterns. For parametric PFSs, valid inference depends on several assumptions, one of which is that the item response theory (IRT) model is correctly specified. Previous studies have used empirical data sets to explore the effects…

Descriptors: Educational Testing, Psychological Testing, Goodness of Fit, Error of Measurement

Validation Methods for Aggregate-Level Test Scale Linking: A Case Study Mapping School District Test Score Distributions to a Common Scale

Peer reviewed
PDF on ERIC

Download full text

Direct link

Reardon, Sean F.; Kalogrides, Demetra; Ho, Andrew D. – Journal of Educational and Behavioral Statistics, 2021

Linking score scales across different tests is considered speculative and fraught, even at the aggregate level. We introduce and illustrate validation methods for aggregate linkages, using the challenge of linking U.S. school district average test scores across states as a motivating example. We show that aggregate linkages can be validated both…

Descriptors: Equated Scores, Validity, Methods, School Districts

Differential Item Functioning Detection with the Mantel-Haenszel Procedure: The Effects of Matching Types and Other Factors

Peer reviewed

Direct link

Socha, Alan; DeMars, Christine E.; Zilberberg, Anna; Phan, Ha – International Journal of Testing, 2015

The Mantel-Haenszel (MH) procedure is commonly used to detect items that function differentially for groups of examinees from various demographic and linguistic backgrounds--for example, in international assessments. As in some other DIF methods, the total score is used to match examinees on ability. In thin matching, each of the total score…

Descriptors: Test Items, Educational Testing, Evaluation Methods, Ability Grouping

How Unstable Are "School Effects" Assessed by a Value-Added Technique?

Peer reviewed
PDF on ERIC

Download full text

Gorad, Stephen; Hordosy, Rita; Siddiqui, Nadia – International Education Studies, 2013

This paper re-considers the widespread use of value-added approaches to estimate school "effects", and shows the results to be very unstable over time. The paper uses as an example the contextualised value-added scores of all secondary schools in England. The study asks how many schools with at least 99% of their pupils included in the…

Descriptors: Foreign Countries, Outcomes of Education, Secondary Education, Educational Testing

A Review of ETS Differential Item Functioning Assessment Procedures: Flagging Rules, Minimum Sample Size Requirements, and Criterion Refinement. Research Report. ETS RR-12-08

Peer reviewed
PDF on ERIC

Download full text

Zwick, Rebecca – ETS Research Report Series, 2012

Differential item functioning (DIF) analysis is a key component in the evaluation of the fairness and validity of educational tests. The goal of this project was to review the status of ETS DIF analysis procedures, focusing on three aspects: (a) the nature and stringency of the statistical rules used to flag items, (b) the minimum sample size…

Descriptors: Test Bias, Sample Size, Bayesian Statistics, Evaluation Methods

Fixing the c Parameter in the Three-Parameter Logistic Model

Peer reviewed
PDF on ERIC

Download full text

Han, Kyung T. – Practical Assessment, Research & Evaluation, 2012

For several decades, the "three-parameter logistic model" (3PLM) has been the dominant choice for practitioners in the field of educational measurement for modeling examinees' response data from multiple-choice (MC) items. Past studies, however, have pointed out that the c-parameter of 3PLM should not be interpreted as a guessing…

Descriptors: Statistical Analysis, Models, Multiple Choice Tests, Guessing (Tests)

Measuring Test Measurement Error: A General Approach

Peer reviewed

Direct link

Boyd, Donald; Lankford, Hamilton; Loeb, Susanna; Wyckoff, James – Journal of Educational and Behavioral Statistics, 2013

Test-based accountability as well as value-added asessments and much experimental and quasi-experimental research in education rely on achievement tests to measure student skills and knowledge. Yet, we know little regarding fundamental properties of these tests, an important example being the extent of measurement error and its implications for…

Descriptors: Accountability, Educational Research, Educational Testing, Error of Measurement

Unraveling Reliability

Peer reviewed

Direct link

Popham, W. James – Educational Leadership, 2009

If a person were to ask an educator to identify the two most important attributes of an education test, the response most certainly would be "validity and reliability." These two tightly wedded concepts have become icons in the field of education assessment. As far as validity is concerned, the term doesn't refer to the accuracy of a test. Rather,…

Descriptors: Educational Testing, Educational Assessment, Student Evaluation, Test Reliability

Online Calibration via Variable Length Computerized Adaptive Testing

Peer reviewed

Direct link

Chang, Yuan-chin Ivan; Lu, Hung-Yi – Psychometrika, 2010

Item calibration is an essential issue in modern item response theory based psychological or educational testing. Due to the popularity of computerized adaptive testing, methods to efficiently calibrate new items have become more important than that in the time when paper and pencil test administration is the norm. There are many calibration…

Descriptors: Test Items, Educational Testing, Adaptive Testing, Measurement

When Can Subscores Have Value?

Peer reviewed

Direct link

Haberman, Shelby J. – Journal of Educational and Behavioral Statistics, 2008

In educational tests, subscores are often generated from a portion of the items in a larger test. Guidelines based on mean squared error are proposed to indicate whether subscores are worth reporting. Alternatives considered are direct reports of subscores, estimates of subscores based on total score, combined estimates based on subscores and…

Descriptors: Testing Programs, Regression (Statistics), Scores, Student Evaluation

Different Tests, Different Answers: The Stability of Teacher Value-Added Estimates across Outcome Measures

Peer reviewed

Direct link

Papay, John P. – American Educational Research Journal, 2011

Recently, educational researchers and practitioners have turned to value-added models to evaluate teacher performance. Although value-added estimates depend on the assessment used to measure student achievement, the importance of outcome selection has received scant attention in the literature. Using data from a large, urban school district, I…

Descriptors: Urban Schools, Teacher Effectiveness, Reading Achievement, Achievement Tests

Standard Error Estimation of 3PL IRT True Score Equating with an MCMC Method

Peer reviewed

Direct link

Liu, Yuming; Schulz, E. Matthew; Yu, Lei – Journal of Educational and Behavioral Statistics, 2008

A Markov chain Monte Carlo (MCMC) method and a bootstrap method were compared in the estimation of standard errors of item response theory (IRT) true score equating. Three test form relationships were examined: parallel, tau-equivalent, and congeneric. Data were simulated based on Reading Comprehension and Vocabulary tests of the Iowa Tests of…

Descriptors: Reading Comprehension, Test Format, Markov Processes, Educational Testing

Performance Assessments with Microworlds and Their Difficulty

Peer reviewed

Direct link

Kluge, Annette – Applied Psychological Measurement, 2008

The use of microworlds (MWs), or complex dynamic systems, in educational testing and personnel selection is hampered by systematic measurement errors because these new and innovative item formats are not adequately controlled for their difficulty. This empirical study introduces a way to operationalize an MW's difficulty and demonstrates the…

Descriptors: Personnel Selection, Self Efficacy, Educational Testing, Computer Uses in Education

On the Virtues and Vices of the Standard Error of Measurement.

Peer reviewed

Williams, Richard H.; Zimmerman, Donald W. – Journal of Experimental Education, 1984

This paper provides a list of 10 salient features of the standard error of measurement, contrasting it to the reliability coefficient. It is concluded that the standard error of measurement should be regarded as a primary characteristic of a mental test. (Author/DWH)

Descriptors: Educational Testing, Error of Measurement, Evaluation Methods, Psychological Testing

Previous Page | Next Page »

Pages: 1 | 2

Boyd, Donald	1
Campbell, Richard T.	1
Chang, Yuan-chin Ivan	1
Cowan, Pamela	1
DeMars, Christine E.	1
Demetrulias, Diana Mayer	1
Dwyer, Carol Anne	1
Falk, Carl F.	1
Gardner, John	1
Gorad, Stephen	1
Haberman, Shelby J.	1
Han, Kyung T.	1
Hauser, Robert M.	1
Ho, Andrew D.	1
Hong, Seong Eun	1
Hordosy, Rita	1
Jencks, Christopher	1
Kalogrides, Demetra	1
Kluge, Annette	1
Lankford, Hamilton	1
Liu, Yuming	1
Loeb, Susanna	1
Lu, Hung-Yi	1
Meijer, Rob R.	1
More ▼