Publication Date
In 2025 | 39 |
Since 2024 | 192 |
Since 2021 (last 5 years) | 495 |
Since 2016 (last 10 years) | 996 |
Since 2006 (last 20 years) | 2028 |
Descriptor
Source
Author
Publication Type
Education Level
Audience
Researchers | 93 |
Practitioners | 23 |
Teachers | 22 |
Policymakers | 10 |
Administrators | 5 |
Students | 4 |
Counselors | 2 |
Parents | 2 |
Community | 1 |
Location
United States | 47 |
Germany | 42 |
Australia | 34 |
Canada | 27 |
Turkey | 27 |
California | 22 |
United Kingdom (England) | 20 |
Netherlands | 18 |
China | 16 |
New York | 15 |
United Kingdom | 15 |
More ▼ |
Laws, Policies, & Programs
Assessments and Surveys
What Works Clearinghouse Rating
Does not meet standards | 1 |
Alexandru Cernat; Vera Toepoel – International Journal of Social Research Methodology, 2024
Most of the social science research is based on the implied assumption that measurement error is the same across key socio-demographic groups and all differences in key statistics of interest are real. Nevertheless, there is evidence that this is not the case. In this paper, the authors tackle this important topic by investigating if data quality…
Descriptors: Error of Measurement, Low Income Groups, Probability, Foreign Countries
Regional Educational Laboratory Mid-Atlantic, 2024
These are the appendixes for the report, "Stabilizing School Performance Indicators in New Jersey to Reduce the Effect of Random Error." This study applied a stabilization model called Bayesian hierarchical modeling to group-level data (with groups assigned according to demographic designations) within schools in New Jersey with the aim…
Descriptors: Institutional Evaluation, Elementary Secondary Education, Bayesian Statistics, Test Reliability
Stefanie A. Wind; Yangmeng Xu – Educational Assessment, 2024
We explored three approaches to resolving or re-scoring constructed-response items in mixed-format assessments: rater agreement, person fit, and targeted double scoring (TDS). We used a simulation study to consider how the three approaches impact the psychometric properties of student achievement estimates, with an emphasis on person fit. We found…
Descriptors: Interrater Reliability, Error of Measurement, Evaluation Methods, Examiners
Inga Laukaityte; Marie Wiberg – Practical Assessment, Research & Evaluation, 2024
The overall aim was to examine effects of differences in group ability and features of the anchor test form on equating bias and the standard error of equating (SEE) using both real and simulated data. Chained kernel equating, Postratification kernel equating, and Circle-arc equating were studied. A college admissions test with four different…
Descriptors: Ability Grouping, Test Items, College Entrance Examinations, High Stakes Tests
Laura Hegemann; Ragna Bugge Askeland; Stian Barbo Valand; Anne-Siri Øyen; Synnve Schjølberg; Vanessa H. Bal; Somer L. Bishop; Camilla Stoltenberg; Tilmann von Soest; Laurie J. Hannigan; Alexandra Havdahl – Autism: The International Journal of Research and Practice, 2024
Autism screening questionnaires are sometimes used as a measure of "autism-associated traits" in samples drawn from the general population, even though such tools are primarily developed and designed for use in samples of children diagnosed with or being assessed for autism. Here, we explore the psychometric properties of the Social…
Descriptors: Autism Spectrum Disorders, Measurement, Clinical Diagnosis, Sex
Cristian Zanon; Nan Zhao; Nursel Topkaya; Ertugrul Sahin; David L. Vogel; Melissa M. Ertl; Samineh Sanatkar; Hsin-Ya Liao; Mark Rubin; Makilim N. Baptista; Winnie W. S. Mak; Fatima Rashed Al-Darmaki; Georg Schomerus; Ying-Fen Wang; Dalia Nasvytiene – International Journal of Testing, 2025
Examinations of the internal structure of the Depression, Anxiety, and Stress Scale-21 (DASS-21) have yielded inconsistent conclusions within and across cultural contexts. This study examined the dimensionality and reliability of the DASS-21 across three theoretically plausible factor structures (i.e., unidimensional, oblique three-factor, and…
Descriptors: Anxiety, Depression (Psychology), Psychometrics, Cultural Context
Paul T. von Hippel; Brendan A. Schuetze – Annenberg Institute for School Reform at Brown University, 2025
Researchers across many fields have called for greater attention to heterogeneity of treatment effects--shifting focus from the average effect to variation in effects between different treatments, studies, or subgroups. True heterogeneity is important, but many reports of heterogeneity have proved to be false, non-replicable, or exaggerated. In…
Descriptors: Educational Research, Replication (Evaluation), Generalizability Theory, Inferences
Yanjing Cao; Chenchen Xu; Shan Lu; Qi Li; Jing Xiao – Psychology in the Schools, 2025
The patient health questionnaire-9 (PHQ-9) is widely utilized in assessing individuals' depression levels. Nevertheless, research regarding its factor structure and measurement invariance remains inadequate. The aim of this study was to delve into the factor structure of the PHQ-9 and to further investigate its measurement invariance across gender…
Descriptors: Factor Structure, Error of Measurement, Factor Analysis, Age Differences
Darling-White, Meghan – Journal of Speech, Language, and Hearing Research, 2022
Purpose: The primary purpose of this study was to validate common respiratory calibration methods for estimating lung volume in children. Method: Respiratory kinematic data were collected via inductive plethysmography from 81 typically developing children and nine children with neuromotor disorders. Correction factors for the rib cage and abdomen…
Descriptors: Physiology, Human Body, Psychomotor Skills, Neurological Impairments
Little, Todd D.; Bontempo, Daniel; Rioux, Charlie; Tracy, Allison – International Journal of Research & Method in Education, 2022
Multilevel modelling (MLM) is the most frequently used approach for evaluating interventions with clustered data. MLM, however, has some limitations that are associated with numerous obstacles to model estimation and valid inferences. Longitudinal multiple-group (LMG) modelling is a longstanding approach for testing intervention effects using…
Descriptors: Longitudinal Studies, Hierarchical Linear Modeling, Alternative Assessment, Intervention
Song, Yoon Ah; Lee, Won-Chan – Applied Measurement in Education, 2022
This article presents the performance of item response theory (IRT) models when double ratings are used as item scores over single ratings when rater effects are present. Study 1 examined the influence of the number of ratings on the accuracy of proficiency estimation in the generalized partial credit model (GPCM). Study 2 compared the accuracy of…
Descriptors: Item Response Theory, Item Analysis, Scores, Accuracy
Silva Diaz, John Alexander; Köhler, Carmen; Hartig, Johannes – Applied Measurement in Education, 2022
Testing item fit is central in item response theory (IRT) modeling, since a good fit is necessary to draw valid inferences from estimated model parameters. "Infit" and "outfit" fit statistics, widespread indices for detecting deviations from the Rasch model, are affected by data factors, such as sample size. Consequently, the…
Descriptors: Intervals, Item Response Theory, Item Analysis, Inferences
Martinková, Patrícia; Bartoš, František; Brabec, Marek – Journal of Educational and Behavioral Statistics, 2023
Inter-rater reliability (IRR), which is a prerequisite of high-quality ratings and assessments, may be affected by contextual variables, such as the rater's or ratee's gender, major, or experience. Identification of such heterogeneity sources in IRR is important for the implementation of policies with the potential to decrease measurement error…
Descriptors: Interrater Reliability, Bayesian Statistics, Statistical Inference, Hierarchical Linear Modeling
Huang, Qi; Bolt, Daniel M. – Educational and Psychological Measurement, 2023
Previous studies have demonstrated evidence of latent skill continuity even in tests intentionally designed for measurement of binary skills. In addition, the assumption of binary skills when continuity is present has been shown to potentially create a lack of invariance in item and latent ability parameters that may undermine applications. In…
Descriptors: Item Response Theory, Test Items, Skill Development, Robustness (Statistics)
Huang, Hening – Research Synthesis Methods, 2023
Many statistical methods (estimators) are available for estimating the consensus value (or average effect) and heterogeneity variance in interlaboratory studies or meta-analyses. These estimators are all valid because they are developed from or supported by certain statistical principles. However, no estimator can be perfect and must have error or…
Descriptors: Statistical Analysis, Computation, Measurement Techniques, Meta Analysis