Publication Date
In 2025 | 0 |
Since 2024 | 0 |
Since 2021 (last 5 years) | 0 |
Since 2016 (last 10 years) | 2 |
Since 2006 (last 20 years) | 5 |
Descriptor
Error of Measurement | 16 |
Reliability | 16 |
Scores | 6 |
True Scores | 5 |
Item Response Theory | 4 |
Classification | 3 |
Estimation (Mathematics) | 3 |
Performance Based Assessment | 3 |
Scaling | 3 |
Academic Achievement | 2 |
Accuracy | 2 |
More ▼ |
Source
Journal of Educational… | 16 |
Author
Publication Type
Journal Articles | 15 |
Reports - Evaluative | 6 |
Reports - Research | 6 |
Reports - Descriptive | 2 |
Book/Product Reviews | 1 |
Guides - Non-Classroom | 1 |
Numerical/Quantitative Data | 1 |
Education Level
Elementary Secondary Education | 1 |
Audience
Location
Laws, Policies, & Programs
No Child Left Behind Act 2001 | 1 |
Assessments and Surveys
ACT Assessment | 1 |
National Longitudinal Study… | 1 |
United States Medical… | 1 |
Work Keys (ACT) | 1 |
What Works Clearinghouse Rating
A Comparison of Procedures for Estimating Person Reliability Parameters in the Graded Response Model
LaHuis, David M.; Bryant-Lees, Kinsey B.; Hakoyama, Shotaro; Barnes, Tyler; Wiemann, Andrea – Journal of Educational Measurement, 2018
Person reliability parameters (PRPs) model temporary changes in individuals' attribute level perceptions when responding to self-report items (higher levels of PRPs represent less fluctuation). PRPs could be useful in measuring careless responding and traitedness. However, it is unclear how well current procedures for estimating PRPs can recover…
Descriptors: Comparative Analysis, Reliability, Error of Measurement, Measurement Techniques
Lee, Won-Chan; Kim, Stella Y.; Choi, Jiwon; Kang, Yujin – Journal of Educational Measurement, 2020
This article considers psychometric properties of composite raw scores and transformed scale scores on mixed-format tests that consist of a mixture of multiple-choice and free-response items. Test scores on several mixed-format tests are evaluated with respect to conditional and overall standard errors of measurement, score reliability, and…
Descriptors: Raw Scores, Item Response Theory, Test Format, Multiple Choice Tests
Raymond, Mark R.; Swygert, Kimberly A.; Kahraman, Nilufer – Journal of Educational Measurement, 2012
Although a few studies report sizable score gains for examinees who repeat performance-based assessments, research has not yet addressed the reliability and validity of inferences based on ratings of repeat examinees on such tests. This study analyzed scores for 8,457 single-take examinees and 4,030 repeat examinees who completed a 6-hour clinical…
Descriptors: Physicians, Licensing Examinations (Professions), Performance Based Assessment, Repetition
Moses, Tim – Journal of Educational Measurement, 2012
The focus of this paper is assessing the impact of measurement errors on the prediction error of an observed-score regression. Measures are presented and described for decomposing the linear regression's prediction error variance into parts attributable to the true score variance and the error variances of the dependent variable and the predictor…
Descriptors: Error of Measurement, Prediction, Regression (Statistics), True Scores
Betebenner, Damian W.; Shang, Yi; Xiang, Yun; Zhao, Yan; Yue, Xiaohui – Journal of Educational Measurement, 2008
No Child Left Behind (NCLB) performance mandates, embedded within state accountability systems, focus school AYP (adequate yearly progress) compliance squarely on the percentage of students at or above proficient. The singular importance of this quantity for decision-making purposes has initiated extensive research into percent proficient as a…
Descriptors: Classification, Error of Measurement, Statistics, Reliability

Camilli, Gregory – Journal of Educational Measurement, 1999
Yen and Burket suggested that shrinkage in vertical equating cannot be understood apart from multidimensionality. Reviews research on reliability, multidimensionality, and scale shrinkage, and explores issues of practical importance to educators. (SLD)
Descriptors: Equated Scores, Error of Measurement, Item Response Theory, Reliability

Kolen, Michael J.; Zeng, Lingjia; Hanson, Bradley A. – Journal of Educational Measurement, 1996
Presents an Item Response Theory (IRT) method for estimating standard errors of measurement of scale scores for the situation in which scale scores are nonlinear transformations of number-correct scores. Also describes procedures for estimating the average conditional standard error of measurement for scale scores and the reliability of scale…
Descriptors: Error of Measurement, Estimation (Mathematics), Item Response Theory, Reliability

Livingston, Samuel A. – Journal of Educational Measurement, 1982
For tests used to make pass/fail decisions, the relevant standard error of measurement (SEM) is the SEM at the passing score. If the test is highly stratified, this SEM should be estimated by a split-halves approach. A formula and its derivation are provided. (Author)
Descriptors: Cutting Scores, Error of Measurement, Estimation (Mathematics), Mathematical Formulas

Lee, Guemin – Journal of Educational Measurement, 2002
Studied the effects of items, passages, contents, themes, and types of passages on the reliability and standard errors of measurement for complex reading comprehension tests using seven different generalizability theory models. Results suggest that passages and themes should be taken into account when evaluating the reliability of test scores for…
Descriptors: Error of Measurement, Generalizability Theory, Models, Reading Comprehension

McMorris, Robert F. – Journal of Educational Measurement, 1972
Approximations were compared with exact statistics obtained on 85 different classroom tests constructed and administered by professors in a variety of fields; means and standard deviation of the resulting differences supported the use of approximations in practical situations. (Author)
Descriptors: Error of Measurement, Measurement Instruments, Reliability, Statistical Analysis

Wang, Tianyou; Kolen, Michael J.; Harris, Deborah J. – Journal of Educational Measurement, 2000
Describes procedures for calculating conditional standard error of measurement (CSEM) and reliability of scale scores and classification of consistency of performance levels. Applied these procedures to data from the American College Testing Program's Work Keys Writing Assessment with sample sizes of 7,097, 1,035, and 1,793. Results show that the…
Descriptors: Adults, Classification, Error of Measurement, Item Response Theory

Rogosa, David R.; Willett, John B. – Journal of Educational Measurement, 1983
Demonstrating good reliability for the difference score in measurement, the results of this study indicate that the difference score is often highly reliable when the correlation between true change and true initial status is nonnegative. In general, when individual differences in true change are appreciable, the difference score shows strong…
Descriptors: Achievement Gains, Error of Measurement, Individual Differences, Measurement Techniques

Livingston, Samuel A.; Wingersky, Marilyn A. – Journal of Educational Measurement, 1979
Procedures are described for studying the reliability of decisions based on specific passing scores with tests made up of discrete items and designed to measure continuous rather than categorical traits. These procedures are based on the estimation of the joint distribution of true scores and observed scores. (CTM)
Descriptors: Cutting Scores, Decision Making, Efficiency, Error of Measurement

Kolen, Michael J.; And Others – Journal of Educational Measurement, 1992
A procedure is described for estimating the reliability and conditional standard errors of measurement of scale scores incorporating the discrete transformation of raw scores to scale scores. The method is illustrated using a strong true score model, and practical applications are described. (SLD)
Descriptors: College Entrance Examinations, Equations (Mathematics), Error of Measurement, Estimation (Mathematics)

Ruiz-Primo, Maria Araceli; And Others – Journal of Educational Measurement, 1993
The stability of scores on 2 types of performance assessments, an observed hands-on investigation and a notebook surrogate, was investigated for 29 sixth graders on 2 occasions. Results indicate that student performance and procedures changed and that generalizability across occasions was moderate. Implications for assessment are discussed. (SLD)
Descriptors: Educational Assessment, Elementary School Students, Error of Measurement, Generalizability Theory
Previous Page | Next Page ยป
Pages: 1 | 2