NotesFAQContact Us
Collection
Advanced
Search Tips
Showing all 11 results Save | Export
Peer reviewed Peer reviewed
Direct linkDirect link
John R. Donoghue; Carol Eckerly – Applied Measurement in Education, 2024
Trend scoring constructed response items (i.e. rescoring Time A responses at Time B) gives rise to two-way data that follow a product multinomial distribution rather than the multinomial distribution that is usually assumed. Recent work has shown that the difference in sampling model can have profound negative effects on statistics usually used to…
Descriptors: Scoring, Error of Measurement, Reliability, Scoring Rubrics
Peer reviewed Peer reviewed
Direct linkDirect link
Song, Yoon Ah; Lee, Won-Chan – Applied Measurement in Education, 2022
This article presents the performance of item response theory (IRT) models when double ratings are used as item scores over single ratings when rater effects are present. Study 1 examined the influence of the number of ratings on the accuracy of proficiency estimation in the generalized partial credit model (GPCM). Study 2 compared the accuracy of…
Descriptors: Item Response Theory, Item Analysis, Scores, Accuracy
Peer reviewed Peer reviewed
Direct linkDirect link
Kopp, Jason P.; Jones, Andrew T. – Applied Measurement in Education, 2020
Traditional psychometric guidelines suggest that at least several hundred respondents are needed to obtain accurate parameter estimates under the Rasch model. However, recent research indicates that Rasch equating results in accurate parameter estimates with sample sizes as small as 25. Item parameter drift under the Rasch model has been…
Descriptors: Item Response Theory, Psychometrics, Sample Size, Sampling
Peer reviewed Peer reviewed
Direct linkDirect link
Lee, HyeSun – Applied Measurement in Education, 2018
The current simulation study examined the effects of Item Parameter Drift (IPD) occurring in a short scale on parameter estimates in multilevel models where scores from a scale were employed as a time-varying predictor to account for outcome scores. Five factors, including three decisions about IPD, were considered for simulation conditions. It…
Descriptors: Test Items, Hierarchical Linear Modeling, Predictor Variables, Scores
Peer reviewed Peer reviewed
Direct linkDirect link
DeMars, Christine – Applied Measurement in Education, 2015
In generalizability theory studies in large-scale testing contexts, sometimes a facet is very sparsely crossed with the object of measurement. For example, when assessments are scored by human raters, it may not be practical to have every rater score all students. Sometimes the scoring is systematically designed such that the raters are…
Descriptors: Educational Assessment, Measurement, Data, Generalizability Theory
Peer reviewed Peer reviewed
Direct linkDirect link
Taylor, Melinda Ann; Pastor, Dena A. – Applied Measurement in Education, 2013
Although federal regulations require testing students with severe cognitive disabilities, there is little guidance regarding how technical quality should be established. It is known that challenges exist with documentation of the reliability of scores for alternate assessments. Typical measures of reliability do little in modeling multiple sources…
Descriptors: Generalizability Theory, Alternative Assessment, Test Reliability, Scores
Peer reviewed Peer reviewed
Direct linkDirect link
Randall, Jennifer; Engelhard, George, Jr. – Applied Measurement in Education, 2010
The psychometric properties and multigroup measurement invariance of scores across subgroups, items, and persons on the "Reading for Meaning" items from the Georgia Criterion Referenced Competency Test (CRCT) were assessed in a sample of 778 seventh-grade students. Specifically, we sought to determine the extent to which score-based…
Descriptors: Testing Accommodations, Test Items, Learning Disabilities, Factor Analysis
Peer reviewed Peer reviewed
Miller, Tamara B.; Kane, Michael – Applied Measurement in Education, 2001
Examined the precision of change scores in terms of error-tolerance (E/T) ratios for both relative and absolute interpretations of change scores. Used E/T ratios to evaluate the error in estimating the change relative to tolerance for error in a particular context. Illustrates the results with achievement test data. (SLD)
Descriptors: Achievement Tests, Error of Measurement, Estimation (Mathematics), Scores
Peer reviewed Peer reviewed
Feldt, Leonard S. – Applied Measurement in Education, 2002
Considers the situation in which content or administrative considerations limit the way in which a test can be partitioned to estimate the internal consistency reliability of the total test score. Demonstrates that a single-valued estimate of the total score reliability is possible only if an assumption is made about the comparative size of the…
Descriptors: Error of Measurement, Reliability, Scores, Test Construction
Peer reviewed Peer reviewed
Direct linkDirect link
Sykes, Robert C.; Hou, Liling – Applied Measurement in Education, 2003
Weighting responses to Constructed-Response (CR) items has been proposed as a way to increase the contribution these items make to the test score when there is insufficient testing time to administer additional CR items. The effect of various types of weighting items of an IRT-based mixed-format writing examination was investigated.…
Descriptors: Item Response Theory, Weighted Scores, Responses, Scores
Peer reviewed Peer reviewed
Direct linkDirect link
Ferrara, Steve; Johnson, Eugene; Chen, Wen-Hung – Applied Measurement in Education, 2005
Psychometricians continue to develop and evaluate methods for linking test scores, both horizontally and vertically. This article describes a social moderation process for articulating (i.e., linking) performance standards across grade levels for an operational state assessment program. The researchers used generated data to evaluate the likely…
Descriptors: Grade 2, Grade 3, Scores, Error of Measurement