Publication Date
| In 2026 | 0 |
| Since 2025 | 59 |
| Since 2022 (last 5 years) | 416 |
| Since 2017 (last 10 years) | 919 |
| Since 2007 (last 20 years) | 1970 |
Descriptor
Source
Author
Publication Type
Education Level
Audience
| Researchers | 93 |
| Practitioners | 23 |
| Teachers | 22 |
| Policymakers | 10 |
| Administrators | 5 |
| Students | 4 |
| Counselors | 2 |
| Parents | 2 |
| Community | 1 |
Location
| United States | 47 |
| Germany | 42 |
| Australia | 34 |
| Canada | 27 |
| Turkey | 27 |
| California | 22 |
| United Kingdom (England) | 20 |
| Netherlands | 18 |
| China | 17 |
| New York | 15 |
| United Kingdom | 15 |
| More ▼ | |
Laws, Policies, & Programs
Assessments and Surveys
What Works Clearinghouse Rating
| Does not meet standards | 1 |
Martinková, Patrícia; Bartoš, František; Brabec, Marek – Journal of Educational and Behavioral Statistics, 2023
Inter-rater reliability (IRR), which is a prerequisite of high-quality ratings and assessments, may be affected by contextual variables, such as the rater's or ratee's gender, major, or experience. Identification of such heterogeneity sources in IRR is important for the implementation of policies with the potential to decrease measurement error…
Descriptors: Interrater Reliability, Bayesian Statistics, Statistical Inference, Hierarchical Linear Modeling
Huang, Qi; Bolt, Daniel M. – Educational and Psychological Measurement, 2023
Previous studies have demonstrated evidence of latent skill continuity even in tests intentionally designed for measurement of binary skills. In addition, the assumption of binary skills when continuity is present has been shown to potentially create a lack of invariance in item and latent ability parameters that may undermine applications. In…
Descriptors: Item Response Theory, Test Items, Skill Development, Robustness (Statistics)
Huang, Hening – Research Synthesis Methods, 2023
Many statistical methods (estimators) are available for estimating the consensus value (or average effect) and heterogeneity variance in interlaboratory studies or meta-analyses. These estimators are all valid because they are developed from or supported by certain statistical principles. However, no estimator can be perfect and must have error or…
Descriptors: Statistical Analysis, Computation, Measurement Techniques, Meta Analysis
Turner, Kyle T.; Engelhard, George, Jr. – Measurement: Interdisciplinary Research and Perspectives, 2023
The purpose of this study is to illustrate the use of functional data analysis (FDA) as a general methodology for analyzing person response functions (PRFs). Applications of FDA to psychometrics have included the estimation of item response functions and latent distributions, as well as differential item functioning. Although FDA has been…
Descriptors: Data Analysis, Item Response Theory, Psychometrics, Statistical Distributions
Lockwood, Adam B.; Klatka, Kelsey; Parker, Brandon; Benson, Nicholas – Journal of Psychoeducational Assessment, 2023
Eighty Woodcock-Johnson IV Tests of Achievement protocols from 40 test administrators were examined to determine the types and frequencies of administration and scoring errors made. Non-critical errors (e.g., failure to record verbatim) were found on every protocol (M = 37.2). Critical (e.g., standard score, start point) errors were found on 98.8%…
Descriptors: Achievement Tests, Testing, Scoring, Error of Measurement
Mohsen Dolatabadi – Australian Journal of Applied Linguistics, 2023
Many datasets resulting from participant ratings for word norms and also concreteness ratios are available. However, the concreteness information of infrequent words and non-words is rare. This work aims to propose a model for estimating the concreteness of infrequent and new lexicons. Here, we used Lancaster sensory-motor word norms to predict…
Descriptors: Prediction, Validity, Models, Computational Linguistics
Wu, Tong – ProQuest LLC, 2023
This three-article dissertation aims to address three methodological challenges to ensure comparability in educational research, including scale linking, test equating, and propensity score (PS) weighting. The first study intends to improve test scale comparability by evaluating the effect of six missing data handling approaches, including…
Descriptors: Educational Research, Comparative Analysis, Equated Scores, Weighted Scores
van Rensburg, Clarisse; Mostert, Karina – Journal of Student Affairs in Africa, 2023
Student well-being has gradually become a topic of interest in higher education, and the accurate, valid, and reliable measure of well-being constructs is crucial in the South African context. This study examined item bias and configural, metric and scalar invariance of the Satisfaction with Life Scale (SWLS) for South African first-year…
Descriptors: Life Satisfaction, Measures (Individuals), Foreign Countries, College Freshmen
Reimers, Jennifer; Turner, Ronna C.; Tendeiro, Jorge N.; Lo, Wen-Juo; Keiffer, Elizabeth – Measurement: Interdisciplinary Research and Perspectives, 2023
Person-fit analyses are commonly used to detect aberrant responding in self-report data. Nonparametric person fit statistics do not require fitting a parametric test theory model and have performed well compared to other person-fit statistics. However, detection of aberrant responding has primarily focused on dominance response data, thus the…
Descriptors: Goodness of Fit, Nonparametric Statistics, Error of Measurement, Comparative Analysis
AL-Dossary, Saeed A.; Almohayya, Bander M. – Psychology in the Schools, 2024
The present study aims to validate the Flourishing Scale (FS) in a convenience sample of 233 special education teachers. The FS's psychometric properties were investigated using exploratory factor analysis (EFA) and confirmatory factor analysis (CFA). EFA had a one-factor solution that explained 49.9% of the variance, a Cronbach's alpha internal…
Descriptors: Error of Measurement, Arabic, Test Construction, Special Education Teachers
Sinethemba Mthimkhulu; Karen Roux; Maryke Mihai – Reading & Writing: Journal of the Literacy Association of South Africa, 2024
Background: PIRLS 2021 results revealed that South African Grade 4 learners performed significantly lower compared to other countries in reading comprehension and that they did not reach the standardised international mean score of 500. It was also evident from the results that English learners performed relatively higher than isiZulu learners.…
Descriptors: Error of Measurement, Grade 4, Reading Comprehension, Scores
Yuanfang Liu; Mark H. C. Lai; Ben Kelcey – Structural Equation Modeling: A Multidisciplinary Journal, 2024
Measurement invariance holds when a latent construct is measured in the same way across different levels of background variables (continuous or categorical) while controlling for the true value of that construct. Using Monte Carlo simulation, this paper compares the multiple indicators, multiple causes (MIMIC) model and MIMIC-interaction to a…
Descriptors: Classification, Accuracy, Error of Measurement, Correlation
Tamra Stambaugh; Lindsay Ellis Lee; Matthew Makel; Scott Peters; Kiana R. Johnson – Gifted Child Today, 2024
The ability to effectively identify students for advanced learning opportunities has been an ongoing issue within the field of gifted education. Common criteria to guide the design and evaluation of identification systems has been essentially non-existent. In this article we provide a practical guide for evaluating and reflecting on the…
Descriptors: Cognitive Tests, Elementary School Students, Grade 2, Academically Gifted
Pere J. Ferrando; David Navarro-González; Fabia Morales-Vives – Educational and Psychological Measurement, 2025
The problem of local item dependencies (LIDs) is very common in personality and attitude measures, particularly in those that measure narrow-bandwidth dimensions. At the structural level, these dependencies can be modeled by using extended factor analytic (FA) solutions that include correlated residuals. However, the effects that LIDs have on the…
Descriptors: Scores, Accuracy, Evaluation Methods, Factor Analysis
Jeff Allen; Ty Cruce – ACT Education Corp., 2025
This report summarizes some of the evidence supporting interpretations of scores from the enhanced ACT, focusing on reliability, concurrent validity, predictive validity, and score comparability. The authors argue that the evidence presented in this report supports the interpretation of scores from the enhanced ACT as measures of high school…
Descriptors: College Entrance Examinations, Testing, Change, Scores

Peer reviewed
Direct link
