Publication Date
In 2025 | 6 |
Since 2024 | 21 |
Since 2021 (last 5 years) | 63 |
Since 2016 (last 10 years) | 137 |
Since 2006 (last 20 years) | 350 |
Descriptor
Error of Measurement | 504 |
Scores | 504 |
Reliability | 104 |
Correlation | 92 |
Item Response Theory | 77 |
Test Reliability | 76 |
Foreign Countries | 71 |
Statistical Analysis | 71 |
Comparative Analysis | 69 |
Psychometrics | 68 |
Computation | 60 |
More ▼ |
Source
Author
Lee, Won-Chan | 10 |
Kolen, Michael J. | 7 |
Haberman, Shelby J. | 6 |
McCaffrey, Daniel F. | 6 |
Reardon, Sean F. | 6 |
Henson, Robin K. | 5 |
Lockwood, J. R. | 5 |
Zimmerman, Donald W. | 5 |
Brennan, Robert L. | 4 |
Ho, Andrew D. | 4 |
Kane, Michael | 4 |
More ▼ |
Publication Type
Education Level
Location
Germany | 7 |
United Kingdom (England) | 7 |
United States | 7 |
Canada | 6 |
Netherlands | 6 |
Australia | 5 |
California | 5 |
North Carolina | 5 |
Turkey | 5 |
China | 4 |
New York | 4 |
More ▼ |
Laws, Policies, & Programs
No Child Left Behind Act 2001 | 5 |
Race to the Top | 4 |
Head Start | 1 |
Assessments and Surveys
What Works Clearinghouse Rating
Gorney, Kylie; Wollack, James A. – Journal of Educational Measurement, 2023
In order to detect a wide range of aberrant behaviors, it can be useful to incorporate information beyond the dichotomous item scores. In this paper, we extend the l[subscript z] and l*[subscript z] person-fit statistics so that unusual behavior in item scores and unusual behavior in item distractors can be used as indicators of aberrance. Through…
Descriptors: Test Items, Scores, Goodness of Fit, Statistics
Tenko Raykov – Educational and Psychological Measurement, 2024
This note is concerned with the benefits that can result from the use of the maximal reliability and optimal linear combination concepts in educational and psychological research. Within the widely used framework of unidimensional multi-component measuring instruments, it is demonstrated that the linear combination of their components that…
Descriptors: Educational Research, Behavioral Science Research, Reliability, Error of Measurement
Francesco Innocenti; Math J. J. M. Candel; Frans E. S. Tan; Gerard J. P. van Breukelen – Journal of Educational and Behavioral Statistics, 2024
Normative studies are needed to obtain norms for comparing individuals with the reference population on relevant clinical or educational measures. Norms can be obtained in an efficient way by regressing the test score on relevant predictors, such as age and sex. When several measures are normed with the same sample, a multivariate regression-based…
Descriptors: Sample Size, Multivariate Analysis, Error of Measurement, Regression (Statistics)
Metsämuuronen, Jari – Practical Assessment, Research & Evaluation, 2022
The reliability of a test score is usually underestimated and the deflation may be profound, 0.40 - 0.60 units of reliability or 46 - 71%. Eight root sources of the deflation are discussed and quantified by a simulation with 1,440 real-world datasets: (1) errors in the measurement modelling, (2) inefficiency in the estimator of reliability within…
Descriptors: Test Reliability, Scores, Test Items, Correlation
John Jerrim; Luis Alejandro Lopez-Agudo; Oscar David Marcenaro-Gutierrez – British Journal of Educational Studies, 2024
International large-scale assessments have gained much attention since the beginning of the twenty-first century, influencing education legislation in many countries. This includes Spain, where they have been used by successive governments to justify education policy change. Unfortunately, there was a problem with the PISA 2018 reading scores for…
Descriptors: Foreign Countries, Achievement Tests, International Assessment, Secondary School Students
John R. Donoghue; Carol Eckerly – Applied Measurement in Education, 2024
Trend scoring constructed response items (i.e. rescoring Time A responses at Time B) gives rise to two-way data that follow a product multinomial distribution rather than the multinomial distribution that is usually assumed. Recent work has shown that the difference in sampling model can have profound negative effects on statistics usually used to…
Descriptors: Scoring, Error of Measurement, Reliability, Scoring Rubrics
Rebekka Kupffer; Susanne Frick; Eunike Wetzel – Educational and Psychological Measurement, 2024
The multidimensional forced-choice (MFC) format is an alternative to rating scales in which participants rank items according to how well the items describe them. Currently, little is known about how to detect careless responding in MFC data. The aim of this study was to adapt a number of indices used for rating scales to the MFC format and…
Descriptors: Measurement Techniques, Alternative Assessment, Rating Scales, Questionnaires
Kelly Edwards; James Soland – Educational Assessment, 2024
Classroom observational protocols, in which raters observe and score the quality of teachers' instructional practices, are often used to evaluate teachers for consequential purposes despite evidence that scores from such protocols are frequently driven by factors, such as rater and temporal effects, that have little to do with teacher quality. In…
Descriptors: Classroom Observation Techniques, Teacher Evaluation, Accuracy, Scores
Vispoel, Walter P.; Lee, Hyeryung; Xu, Guanlan; Hong, Hyeri – Journal of Experimental Education, 2023
Although generalizability theory (GT) designs have traditionally been analyzed within an ANOVA framework, identical results can be obtained with structural equation models (SEMs) but extended to represent multiple sources of both systematic and measurement error variance, include estimation methods less likely to produce negative variance…
Descriptors: Generalizability Theory, Structural Equation Models, Programming Languages, Scores
Jake C. Steggerda; Sandra Yu Rueger; Ana J. Bridges – Children & Schools, 2024
Authors evaluated the Student Behavior Checklist-Brief (SBC-B) to test whether teacher-reports of student learning approach (i.e., learned helplessness [LH] and mastery orientation [MO]) were invariant across academic subjects. The current sample includes ethnically diverse seventh and eighth grade students (N = 145; 53 percent male) and six teams…
Descriptors: Psychometrics, Student Behavior, Check Lists, Scores
Emma Somer; Carl Falk; Milica Miocevic – Structural Equation Modeling: A Multidisciplinary Journal, 2024
Factor Score Regression (FSR) is increasingly employed as an alternative to structural equation modeling (SEM) in small samples. Despite its popularity in psychology, the performance of FSR in multigroup models with small samples remains relatively unknown. The goal of this study was to examine the performance of FSR, namely Croon's correction and…
Descriptors: Scores, Structural Equation Models, Comparative Analysis, Sample Size
Cornelis Potgieter; Xin Qiao; Akihito Kamata; Yusuf Kara – Grantee Submission, 2024
As part of the effort to develop an improved oral reading fluency (ORF) assessment system, Kara et al. (2020) estimated the ORF scores based on a latent variable psychometric model of accuracy and speed for ORF data via a fully Bayesian approach. This study further investigates likelihood-based estimators for the model-derived ORF scores,…
Descriptors: Oral Reading, Reading Fluency, Scores, Psychometrics
Cornelis Potgieter; Xin Qiao; Akihito Kamata; Yusuf Kara – Journal of Educational Measurement, 2024
As part of the effort to develop an improved oral reading fluency (ORF) assessment system, Kara et al. estimated the ORF scores based on a latent variable psychometric model of accuracy and speed for ORF data via a fully Bayesian approach. This study further investigates likelihood-based estimators for the model-derived ORF scores, including…
Descriptors: Oral Reading, Reading Fluency, Scores, Psychometrics
Almehrizi, Rashid S. – Journal of Educational Measurement, 2021
Estimates of various variance components, universe score variance, measurement error variances, and generalizability coefficients, like all statistics, are subject to sampling variability, particularly in small samples. Such variability is quantified traditionally through estimated standard errors and/or confidence intervals. The paper derived new…
Descriptors: Error of Measurement, Statistics, Design, Generalizability Theory
Sample Size and Item Parameter Estimation Precision When Utilizing the Masters' Partial Credit Model
Custer, Michael; Kim, Jongpil – Online Submission, 2023
This study utilizes an analysis of diminishing returns to examine the relationship between sample size and item parameter estimation precision when utilizing the Masters' Partial Credit Model for polytomous items. Item data from the standardization of the Batelle Developmental Inventory, 3rd Edition were used. Each item was scored with a…
Descriptors: Sample Size, Item Response Theory, Test Items, Computation