ERIC - Search Results

Publication Date

In 2025	0
Since 2024	0
Since 2021 (last 5 years)	2
Since 2016 (last 10 years)	3
Since 2006 (last 20 years)	8

Descriptor

Error of Measurement	11
Item Response Theory	11
Validity	11
Comparative Analysis	4
Equated Scores	4
Educational Assessment	3
Foreign Countries	3
Language Tests	3
Simulation	3
Ability	2
Computation	2
Interrater Reliability	2
Item Analysis	2
Measurement	2
Measurement Techniques	2
Models	2
National Surveys	2
Reliability	2
Statistical Analysis	2
Test Items	2
Ability Grouping	1
Achievement Tests	1
Adaptive Testing	1
Attitude Measures	1
Attitudes	1
More ▼

Source

International Journal of…	2
Language Assessment Quarterly	2
Applied Measurement in…	1
Language Testing	1
Mid-Western Educational…	1
ProQuest LLC	1

Author

Hedges, Larry V.	2
Vevea, Jack L.	2
Deygers, Bart	1
Dimitrov, Dimiter M.	1
Duong, Minh Q.	1
Finch, Holmes	1
Holster, Trevor A.	1
Hsieh, Mingchuan	1
Lake, J.	1
Roberts, James S.	1
Sass, D. A.	1
Schmitt, T. A.	1
Sullivan, J. R.	1
Van Gorp, Koen	1
Walker, C. M.	1
Wu, Tong	1
von Davier, Alina A.	1
More ▼

Publication Type

Reports - Research	8
Journal Articles	7
Speeches/Meeting Papers	2
Dissertations/Theses -…	1
Numerical/Quantitative Data	1
Reports - Evaluative	1

Education Level

Elementary Education	1
Grade 6	1
Higher Education	1
Intermediate Grades	1
Postsecondary Education	1

Audience

Location

Japan	1
Netherlands	1
Taiwan	1

Laws, Policies, & Programs

Assessments and Surveys

National Assessment of…

What Works Clearinghouse Rating

Showing all 11 results Save | Export

Comparison of Methods for Identifying Differential Step Functioning with Polytomous Item Response Data

Peer reviewed

Direct link

Finch, Holmes – Applied Measurement in Education, 2022

Much research has been devoted to identification of differential item functioning (DIF), which occurs when the item responses for individuals from two groups differ after they are conditioned on the latent trait being measured by the scale. There has been less work examining differential step functioning (DSF), which is present for polytomous…

Descriptors: Comparative Analysis, Item Response Theory, Item Analysis, Simulation

An Exploration of Comparability Issues in Educational Research: Scale Linking, Equating, and Propensity Score Weighting

Direct link

Wu, Tong – ProQuest LLC, 2023

This three-article dissertation aims to address three methodological challenges to ensure comparability in educational research, including scale linking, test equating, and propensity score (PS) weighting. The first study intends to improve test scale comparability by evaluating the effect of six missing data handling approaches, including…

Descriptors: Educational Research, Comparative Analysis, Equated Scores, Weighted Scores

Determining the Scoring Validity of a Co-Constructed CEFR-Based Rating Scale

Peer reviewed

Direct link

Deygers, Bart; Van Gorp, Koen – Language Testing, 2015

Considering scoring validity as encompassing both reliable rating scale use and valid descriptor interpretation, this study reports on the validation of a CEFR-based scale that was co-constructed and used by novice raters. The research questions this paper wishes to answer are (a) whether it is possible to construct a CEFR-based rating scale with…

Descriptors: Rating Scales, Scoring, Validity, Interrater Reliability

Guessing and the Rasch Model

Peer reviewed

Direct link

Holster, Trevor A.; Lake, J. – Language Assessment Quarterly, 2016

Stewart questioned Beglar's use of Rasch analysis of the Vocabulary Size Test (VST) and advocated the use of 3-parameter logistic item response theory (3PLIRT) on the basis that it models a non-zero lower asymptote for items, often called a "guessing" parameter. In support of this theory, Stewart presented fit statistics derived from…

Descriptors: Guessing (Tests), Item Response Theory, Vocabulary, Language Tests

Observed-Score Equating with a Heterogeneous Target Population

Peer reviewed

Direct link

Duong, Minh Q.; von Davier, Alina A. – International Journal of Testing, 2012

Test equating is a statistical procedure for adjusting for test form differences in difficulty in a standardized assessment. Equating results are supposed to hold for a specified target population (Kolen & Brennan, 2004; von Davier, Holland, & Thayer, 2004) and to be (relatively) independent of the subpopulations from the target population (see…

Descriptors: Ability Grouping, Difficulty Level, Psychometrics, Statistical Analysis

Comparing Yes/No Angoff and Bookmark Standard Setting Methods in the Context of English Assessment

Peer reviewed

Direct link

Hsieh, Mingchuan – Language Assessment Quarterly, 2013

The Yes/No Angoff and Bookmark method for setting standards on educational assessment are currently two of the most popular standard-setting methods. However, there is no research into the comparability of these two methods in the context of language assessment. This study compared results from the Yes/No Angoff and Bookmark methods as applied to…

Descriptors: Standard Setting (Scoring), Comparative Analysis, Language Tests, Multiple Choice Tests

Contemporary Treatment of Reliability and Validity in Educational Assessment

Peer reviewed

Direct link

Dimitrov, Dimiter M. – Mid-Western Educational Researcher, 2010

The focus of this presidential address is on the contemporary treatment of reliability and validity in educational assessment. Highlights on reliability are provided under the classical true-score model using tools from latent trait modeling to clarify important assumptions and procedures for reliability estimation. In addition to reliability,…

Descriptors: Educational Assessment, Validity, Item Response Theory, Reliability

A Monte Carlo Simulation Investigating the Validity and Reliability of Ability Estimation in Item Response Theory with Speeded Computer Adaptive Tests

Peer reviewed

Direct link

Schmitt, T. A.; Sass, D. A.; Sullivan, J. R.; Walker, C. M. – International Journal of Testing, 2010

Imposed time limits on computer adaptive tests (CATs) can result in examinees having difficulty completing all items, thus compromising the validity and reliability of ability estimates. In this study, the effects of speededness were explored in a simulated CAT environment by varying examinee response patterns to end-of-test items. Expectedly,…

Descriptors: Monte Carlo Methods, Simulation, Computer Assisted Testing, Adaptive Testing

A Study of Equating in NAEP. NAEP Validity Studies. Working Paper Series.

Download full text

Hedges, Larry V.; Vevea, Jack L. – 2003

A computer simulation study was conducted to investigate the amount of uncertainty added to National Assessment of Educational Progress estimates by equating error under three different equating methods and while varying a number of factors that might affect accuracy of equating. Data from past NAEP administrations were used to guide the…

Descriptors: Computer Simulation, Equated Scores, Error of Measurement, Item Response Theory

Comparative Validity of the Likert and Thurstone Approaches to Attitude Measurement.

Download full text

Roberts, James S.; And Others – 1997

Graded or binary disagree-agree responses to attitude statements are often collected for the purpose of attitude measurement. The empirical characteristics of these responses will generally be inconsistent with the analytical logic that forms the basis of the Likert attitude measurement technique (R. Likert, 1932). As a consequence, the Likert…

Descriptors: Attitude Measures, Attitudes, Comparative Analysis, Error of Measurement

A Study of Equating in NAEP. NAEP Validity Studies.

Download full text

Hedges, Larry V.; Vevea, Jack L. – 1997

This study investigates the amount of uncertainty added to National Assessment of Educational Progress (NAEP) estimates by equating error under both ideal and less than ideal circumstances. Data from past administrations are used to guide simulations of various equating designs and error due to equating is estimated empirically. The design…

Descriptors: Ability, Elementary Secondary Education, Equated Scores, Error of Measurement