Publication Date
In 2025 | 0 |
Since 2024 | 0 |
Since 2021 (last 5 years) | 2 |
Since 2016 (last 10 years) | 3 |
Since 2006 (last 20 years) | 8 |
Descriptor
Error of Measurement | 11 |
Item Response Theory | 11 |
Validity | 11 |
Comparative Analysis | 4 |
Equated Scores | 4 |
Educational Assessment | 3 |
Foreign Countries | 3 |
Language Tests | 3 |
Simulation | 3 |
Ability | 2 |
Computation | 2 |
More ▼ |
Source
International Journal of… | 2 |
Language Assessment Quarterly | 2 |
Applied Measurement in… | 1 |
Language Testing | 1 |
Mid-Western Educational… | 1 |
ProQuest LLC | 1 |
Author
Hedges, Larry V. | 2 |
Vevea, Jack L. | 2 |
Deygers, Bart | 1 |
Dimitrov, Dimiter M. | 1 |
Duong, Minh Q. | 1 |
Finch, Holmes | 1 |
Holster, Trevor A. | 1 |
Hsieh, Mingchuan | 1 |
Lake, J. | 1 |
Roberts, James S. | 1 |
Sass, D. A. | 1 |
More ▼ |
Publication Type
Reports - Research | 8 |
Journal Articles | 7 |
Speeches/Meeting Papers | 2 |
Dissertations/Theses -… | 1 |
Numerical/Quantitative Data | 1 |
Reports - Evaluative | 1 |
Education Level
Elementary Education | 1 |
Grade 6 | 1 |
Higher Education | 1 |
Intermediate Grades | 1 |
Postsecondary Education | 1 |
Audience
Location
Japan | 1 |
Netherlands | 1 |
Taiwan | 1 |
Laws, Policies, & Programs
Assessments and Surveys
National Assessment of… | 2 |
What Works Clearinghouse Rating
Finch, Holmes – Applied Measurement in Education, 2022
Much research has been devoted to identification of differential item functioning (DIF), which occurs when the item responses for individuals from two groups differ after they are conditioned on the latent trait being measured by the scale. There has been less work examining differential step functioning (DSF), which is present for polytomous…
Descriptors: Comparative Analysis, Item Response Theory, Item Analysis, Simulation
Wu, Tong – ProQuest LLC, 2023
This three-article dissertation aims to address three methodological challenges to ensure comparability in educational research, including scale linking, test equating, and propensity score (PS) weighting. The first study intends to improve test scale comparability by evaluating the effect of six missing data handling approaches, including…
Descriptors: Educational Research, Comparative Analysis, Equated Scores, Weighted Scores
Deygers, Bart; Van Gorp, Koen – Language Testing, 2015
Considering scoring validity as encompassing both reliable rating scale use and valid descriptor interpretation, this study reports on the validation of a CEFR-based scale that was co-constructed and used by novice raters. The research questions this paper wishes to answer are (a) whether it is possible to construct a CEFR-based rating scale with…
Descriptors: Rating Scales, Scoring, Validity, Interrater Reliability
Holster, Trevor A.; Lake, J. – Language Assessment Quarterly, 2016
Stewart questioned Beglar's use of Rasch analysis of the Vocabulary Size Test (VST) and advocated the use of 3-parameter logistic item response theory (3PLIRT) on the basis that it models a non-zero lower asymptote for items, often called a "guessing" parameter. In support of this theory, Stewart presented fit statistics derived from…
Descriptors: Guessing (Tests), Item Response Theory, Vocabulary, Language Tests
Duong, Minh Q.; von Davier, Alina A. – International Journal of Testing, 2012
Test equating is a statistical procedure for adjusting for test form differences in difficulty in a standardized assessment. Equating results are supposed to hold for a specified target population (Kolen & Brennan, 2004; von Davier, Holland, & Thayer, 2004) and to be (relatively) independent of the subpopulations from the target population (see…
Descriptors: Ability Grouping, Difficulty Level, Psychometrics, Statistical Analysis
Hsieh, Mingchuan – Language Assessment Quarterly, 2013
The Yes/No Angoff and Bookmark method for setting standards on educational assessment are currently two of the most popular standard-setting methods. However, there is no research into the comparability of these two methods in the context of language assessment. This study compared results from the Yes/No Angoff and Bookmark methods as applied to…
Descriptors: Standard Setting (Scoring), Comparative Analysis, Language Tests, Multiple Choice Tests
Dimitrov, Dimiter M. – Mid-Western Educational Researcher, 2010
The focus of this presidential address is on the contemporary treatment of reliability and validity in educational assessment. Highlights on reliability are provided under the classical true-score model using tools from latent trait modeling to clarify important assumptions and procedures for reliability estimation. In addition to reliability,…
Descriptors: Educational Assessment, Validity, Item Response Theory, Reliability
Schmitt, T. A.; Sass, D. A.; Sullivan, J. R.; Walker, C. M. – International Journal of Testing, 2010
Imposed time limits on computer adaptive tests (CATs) can result in examinees having difficulty completing all items, thus compromising the validity and reliability of ability estimates. In this study, the effects of speededness were explored in a simulated CAT environment by varying examinee response patterns to end-of-test items. Expectedly,…
Descriptors: Monte Carlo Methods, Simulation, Computer Assisted Testing, Adaptive Testing
Hedges, Larry V.; Vevea, Jack L. – 2003
A computer simulation study was conducted to investigate the amount of uncertainty added to National Assessment of Educational Progress estimates by equating error under three different equating methods and while varying a number of factors that might affect accuracy of equating. Data from past NAEP administrations were used to guide the…
Descriptors: Computer Simulation, Equated Scores, Error of Measurement, Item Response Theory
Roberts, James S.; And Others – 1997
Graded or binary disagree-agree responses to attitude statements are often collected for the purpose of attitude measurement. The empirical characteristics of these responses will generally be inconsistent with the analytical logic that forms the basis of the Likert attitude measurement technique (R. Likert, 1932). As a consequence, the Likert…
Descriptors: Attitude Measures, Attitudes, Comparative Analysis, Error of Measurement
Hedges, Larry V.; Vevea, Jack L. – 1997
This study investigates the amount of uncertainty added to National Assessment of Educational Progress (NAEP) estimates by equating error under both ideal and less than ideal circumstances. Data from past administrations are used to guide simulations of various equating designs and error due to equating is estimated empirically. The design…
Descriptors: Ability, Elementary Secondary Education, Equated Scores, Error of Measurement