NotesFAQContact Us
Collection
Advanced
Search Tips
What Works Clearinghouse Rating
Does not meet standards1
Showing 211 to 225 of 3,316 results Save | Export
Peer reviewed Peer reviewed
Direct linkDirect link
Raykov, Tenko; Marcoulides, George A.; Pusic, Martin – Measurement: Interdisciplinary Research and Perspectives, 2021
An interval estimation procedure is discussed that can be used to evaluate the probability of a particular response for a binary or binary scored item at a pre-specified point along an underlying latent continuum. The item is assumed to: (a) be part of a unidimensional multi-component measuring instrument that may contain also polytomous items,…
Descriptors: Item Response Theory, Computation, Probability, Test Items
Peer reviewed Peer reviewed
Direct linkDirect link
Alexandru Cernat; Vera Toepoel – International Journal of Social Research Methodology, 2024
Most of the social science research is based on the implied assumption that measurement error is the same across key socio-demographic groups and all differences in key statistics of interest are real. Nevertheless, there is evidence that this is not the case. In this paper, the authors tackle this important topic by investigating if data quality…
Descriptors: Error of Measurement, Low Income Groups, Probability, Foreign Countries
Peer reviewed Peer reviewed
PDF on ERIC Download full text
Regional Educational Laboratory Mid-Atlantic, 2024
These are the appendixes for the report, "Stabilizing School Performance Indicators in New Jersey to Reduce the Effect of Random Error." This study applied a stabilization model called Bayesian hierarchical modeling to group-­level data (with groups assigned according to demographic designations) within schools in New Jersey with the aim…
Descriptors: Institutional Evaluation, Elementary Secondary Education, Bayesian Statistics, Test Reliability
Peer reviewed Peer reviewed
Direct linkDirect link
Stefanie A. Wind; Yangmeng Xu – Educational Assessment, 2024
We explored three approaches to resolving or re-scoring constructed-response items in mixed-format assessments: rater agreement, person fit, and targeted double scoring (TDS). We used a simulation study to consider how the three approaches impact the psychometric properties of student achievement estimates, with an emphasis on person fit. We found…
Descriptors: Interrater Reliability, Error of Measurement, Evaluation Methods, Examiners
Peer reviewed Peer reviewed
PDF on ERIC Download full text
Inga Laukaityte; Marie Wiberg – Practical Assessment, Research & Evaluation, 2024
The overall aim was to examine effects of differences in group ability and features of the anchor test form on equating bias and the standard error of equating (SEE) using both real and simulated data. Chained kernel equating, Postratification kernel equating, and Circle-arc equating were studied. A college admissions test with four different…
Descriptors: Ability Grouping, Test Items, College Entrance Examinations, High Stakes Tests
Peer reviewed Peer reviewed
Direct linkDirect link
Laura Hegemann; Ragna Bugge Askeland; Stian Barbo Valand; Anne-Siri Øyen; Synnve Schjølberg; Vanessa H. Bal; Somer L. Bishop; Camilla Stoltenberg; Tilmann von Soest; Laurie J. Hannigan; Alexandra Havdahl – Autism: The International Journal of Research and Practice, 2024
Autism screening questionnaires are sometimes used as a measure of "autism-associated traits" in samples drawn from the general population, even though such tools are primarily developed and designed for use in samples of children diagnosed with or being assessed for autism. Here, we explore the psychometric properties of the Social…
Descriptors: Autism Spectrum Disorders, Measurement, Clinical Diagnosis, Sex
Paul T. von Hippel; Brendan A. Schuetze – Annenberg Institute for School Reform at Brown University, 2025
Researchers across many fields have called for greater attention to heterogeneity of treatment effects--shifting focus from the average effect to variation in effects between different treatments, studies, or subgroups. True heterogeneity is important, but many reports of heterogeneity have proved to be false, non-replicable, or exaggerated. In…
Descriptors: Educational Research, Replication (Evaluation), Generalizability Theory, Inferences
Peer reviewed Peer reviewed
Direct linkDirect link
Yanjing Cao; Chenchen Xu; Shan Lu; Qi Li; Jing Xiao – Psychology in the Schools, 2025
The patient health questionnaire-9 (PHQ-9) is widely utilized in assessing individuals' depression levels. Nevertheless, research regarding its factor structure and measurement invariance remains inadequate. The aim of this study was to delve into the factor structure of the PHQ-9 and to further investigate its measurement invariance across gender…
Descriptors: Factor Structure, Error of Measurement, Factor Analysis, Age Differences
Peer reviewed Peer reviewed
Direct linkDirect link
Cristian Zanon; Nan Zhao; Nursel Topkaya; Ertugrul Sahin; David L. Vogel; Melissa M. Ertl; Samineh Sanatkar; Hsin-Ya Liao; Mark Rubin; Makilim N. Baptista; Winnie W. S. Mak; Fatima Rashed Al-Darmaki; Georg Schomerus; Ying-Fen Wang; Dalia Nasvytiene – International Journal of Testing, 2025
Examinations of the internal structure of the Depression, Anxiety, and Stress Scale-21 (DASS-21) have yielded inconsistent conclusions within and across cultural contexts. This study examined the dimensionality and reliability of the DASS-21 across three theoretically plausible factor structures (i.e., unidimensional, oblique three-factor, and…
Descriptors: Anxiety, Depression (Psychology), Psychometrics, Cultural Context
Peer reviewed Peer reviewed
Direct linkDirect link
Sebastian Harenberg; Lindsey Keenan; Yvette Ingram; Sayre Wilson; Justine Vosloo; Miranda Kaye – Journal of American College Health, 2025
Background/purpose: Depressive symptoms are prevalent in student-athletes. Evidence for the factorial validity of measures assessing depressive symptoms in student-athletes is presently absent from the literature. This study examined the best fitting factorial structure and invariance across sexes of the PHQ-9. Methods: Data were collected from…
Descriptors: Student Athletes, Depression (Psychology), Symptoms (Individual Disorders), Gender Differences
Antoniuk, Andrea; Cormier, Damien C. – Communique, 2020
School psychologists may experience examiner drift--a deviation from standardized administration and scoring procedures that occurs slowly over time. The purpose of this article is to explain how examiner drift occurs, outline how it can be assessed, and how it can be prevented.
Descriptors: Error of Measurement, Standardized Tests, School Psychologists, Skill Development
Peer reviewed Peer reviewed
Direct linkDirect link
Darling-White, Meghan – Journal of Speech, Language, and Hearing Research, 2022
Purpose: The primary purpose of this study was to validate common respiratory calibration methods for estimating lung volume in children. Method: Respiratory kinematic data were collected via inductive plethysmography from 81 typically developing children and nine children with neuromotor disorders. Correction factors for the rib cage and abdomen…
Descriptors: Physiology, Human Body, Psychomotor Skills, Neurological Impairments
Peer reviewed Peer reviewed
Direct linkDirect link
Little, Todd D.; Bontempo, Daniel; Rioux, Charlie; Tracy, Allison – International Journal of Research & Method in Education, 2022
Multilevel modelling (MLM) is the most frequently used approach for evaluating interventions with clustered data. MLM, however, has some limitations that are associated with numerous obstacles to model estimation and valid inferences. Longitudinal multiple-group (LMG) modelling is a longstanding approach for testing intervention effects using…
Descriptors: Longitudinal Studies, Hierarchical Linear Modeling, Alternative Assessment, Intervention
Peer reviewed Peer reviewed
Direct linkDirect link
Song, Yoon Ah; Lee, Won-Chan – Applied Measurement in Education, 2022
This article presents the performance of item response theory (IRT) models when double ratings are used as item scores over single ratings when rater effects are present. Study 1 examined the influence of the number of ratings on the accuracy of proficiency estimation in the generalized partial credit model (GPCM). Study 2 compared the accuracy of…
Descriptors: Item Response Theory, Item Analysis, Scores, Accuracy
Peer reviewed Peer reviewed
Direct linkDirect link
Silva Diaz, John Alexander; Köhler, Carmen; Hartig, Johannes – Applied Measurement in Education, 2022
Testing item fit is central in item response theory (IRT) modeling, since a good fit is necessary to draw valid inferences from estimated model parameters. "Infit" and "outfit" fit statistics, widespread indices for detecting deviations from the Rasch model, are affected by data factors, such as sample size. Consequently, the…
Descriptors: Intervals, Item Response Theory, Item Analysis, Inferences
Pages: 1  |  ...  |  11  |  12  |  13  |  14  |  15  |  16  |  17  |  18  |  19  |  ...  |  222