Publication Date
In 2025 | 4 |
Since 2024 | 7 |
Since 2021 (last 5 years) | 7 |
Since 2016 (last 10 years) | 22 |
Since 2006 (last 20 years) | 38 |
Descriptor
Source
Author
Schochet, Peter Z. | 2 |
Al-bakr, Fawziah | 1 |
Albano, Anthony D. | 1 |
Artur Pokropek | 1 |
Bakia, Marianne | 1 |
Benassi, Victor A. | 1 |
Canivez, Gary L. | 1 |
Carl Westine | 1 |
Carmen Köhler | 1 |
Choi, Kilchan | 1 |
Crowley, Ryann | 1 |
More ▼ |
Publication Type
Reports - Research | 38 |
Journal Articles | 31 |
Tests/Questionnaires | 2 |
Numerical/Quantitative Data | 1 |
Speeches/Meeting Papers | 1 |
Education Level
Audience
Location
Texas | 2 |
Canada | 1 |
Chile | 1 |
Florida | 1 |
Germany | 1 |
Kentucky (Louisville) | 1 |
Maine | 1 |
Netherlands | 1 |
North Carolina | 1 |
Qatar | 1 |
Spain (Madrid) | 1 |
More ▼ |
Laws, Policies, & Programs
Race to the Top | 1 |
Assessments and Surveys
Program for International… | 3 |
Graduate Record Examinations | 1 |
Motivated Strategies for… | 1 |
National Assessment of… | 1 |
SAT (College Admission Test) | 1 |
Wechsler Intelligence Scale… | 1 |
What Works Clearinghouse Rating
Francis L. Huang – Large-scale Assessments in Education, 2024
The use of large-scale assessments (LSAs) in education has grown in the past decade though analysis of LSAs using multilevel models (MLMs) using R has been limited. A reason for its limited use may be due to the complexity of incorporating both plausible values and weighted analyses in the multilevel analyses of LSA data. We provide additional…
Descriptors: Hierarchical Linear Modeling, Evaluation Methods, Educational Assessment, Data Analysis
Yasuhiro Yamamoto; Yasuo Miyazaki – Journal of Experimental Education, 2025
Bayesian methods have been said to solve small sample problems in frequentist methods by reflecting prior knowledge in the prior distribution. However, there are dangers in strongly reflecting prior knowledge or situations where much prior knowledge cannot be used. In order to address the issue, in this article, we considered to apply two Bayesian…
Descriptors: Sample Size, Hierarchical Linear Modeling, Bayesian Statistics, Prior Learning
Tong Wu; Stella Y. Kim; Carl Westine; Michelle Boyer – Journal of Educational Measurement, 2025
While significant attention has been given to test equating to ensure score comparability, limited research has explored equating methods for rater-mediated assessments, where human raters inherently introduce error. If not properly addressed, these errors can undermine score interchangeability and test validity. This study proposes an equating…
Descriptors: Item Response Theory, Evaluators, Error of Measurement, Test Validity
Daniel McNeish; Patrick D. Manapat – Structural Equation Modeling: A Multidisciplinary Journal, 2024
A recent review found that 11% of published factor models are hierarchical models with second-order factors. However, dedicated recommendations for evaluating hierarchical model fit have yet to emerge. Traditional benchmarks like RMSEA <0.06 or CFI >0.95 are often consulted, but they were never intended to generalize to hierarchical models.…
Descriptors: Factor Analysis, Goodness of Fit, Hierarchical Linear Modeling, Benchmarking
Mingya Huang; David Kaplan – Journal of Educational and Behavioral Statistics, 2025
The issue of model uncertainty has been gaining interest in education and the social sciences community over the years, and the dominant methods for handling model uncertainty are based on Bayesian inference, particularly, Bayesian model averaging. However, Bayesian model averaging assumes that the true data-generating model is within the…
Descriptors: Bayesian Statistics, Hierarchical Linear Modeling, Statistical Inference, Predictor Variables
Carmen Köhler; Lale Khorramdel; Artur Pokropek; Johannes Hartig – Journal of Educational Measurement, 2024
For assessment scales applied to different groups (e.g., students from different states; patients in different countries), multigroup differential item functioning (MG-DIF) needs to be evaluated in order to ensure that respondents with the same trait level but from different groups have equal response probabilities on a particular item. The…
Descriptors: Measures (Individuals), Test Bias, Models, Item Response Theory
Stephen M. Leach; Jason C. Immekus; Jeffrey C. Valentine; Prathiba Batley; Dena Dossett; Tamara Lewis; Thomas Reece – Assessment for Effective Intervention, 2025
Educators commonly use school climate survey scores to inform and evaluate interventions for equitably improving learning and reducing educational disparities. Unfortunately, validity evidence to support these (and other) score uses often falls short. In response, Whitehouse et al. proposed a collaborative, two-part validity testing framework for…
Descriptors: School Surveys, Measurement, Hierarchical Linear Modeling, Educational Environment
Reardon, Sean F.; Ho, Andrew D.; Kalogrides, Demetra – Stanford Center for Education Policy Analysis, 2019
Linking score scales across different tests is considered speculative and fraught, even at the aggregate level (Feuer et al., 1999; Thissen, 2007). We introduce and illustrate validation methods for aggregate linkages, using the challenge of linking U.S. school district average test scores across states as a motivating example. We show that…
Descriptors: Test Validity, Evaluation Methods, School Districts, Scores
Peralta, Yadira; Moreno, Mario; Harwell, Michael; Guzey, S. Selcen; Moore, Tamara J. – Educational Research Quarterly, 2018
Variance heterogeneity is a common feature of educational data when treatment differences expressed through means are present, and often reflects a treatment by subject interaction with respect to an outcome variable. Identifying variables that account for this interaction can enhance understanding of whom a treatment does and does not benefit in…
Descriptors: Educational Research, Hierarchical Linear Modeling, Engineering, Design
Van Norman, Ethan R.; Parker, David C. – Journal of Psychoeducational Assessment, 2018
One consideration for selecting progress monitoring tools is the reliability in which the instrument measures student response to instruction. Researchers and vendors establish reliability of growth using two analytic methods: (a) calculating slopes to even and odd observations for each student and correlating the resulting slopes (split-half),…
Descriptors: Curriculum Based Assessment, Hierarchical Linear Modeling, Reliability, Progress Monitoring
Herrmann-Abell, Cari F.; Hardcastle, Joseph; DeBoer, George E. – Grantee Submission, 2018
We compared students' performance on a paper-based test (PBT) and three computer-based tests (CBTs). The three computer-based tests used different test navigation and answer selection features, allowing us to examine how these features affect student performance. The study sample consisted of 9,698 fourth through twelfth grade students from across…
Descriptors: Evaluation Methods, Tests, Computer Assisted Testing, Scores
McGill, Ryan J.; Canivez, Gary L. – Journal of Psychoeducational Assessment, 2016
As recommended by Carroll, the present study examined the factor structure of the Wechsler Intelligence Scale for Children-Fourth Edition Spanish (WISC-IV Spanish) normative sample using higher order exploratory factor analytic techniques not included in the WISC-IV Spanish Technical Manual. Results indicated that the WISC-IV Spanish subtests were…
Descriptors: Children, Intelligence Tests, Spanish, Factor Analysis
van Hek, Margriet; Kraaykamp, Gerbert; Pelzer, Ben – School Effectiveness and School Improvement, 2018
Few studies on male-female inequalities in education have elaborated on whether school characteristics affect girls' and boys' educational performance differently. This study investigated how school resources, being schools' socioeconomic composition, proportion of girls, and proportion of highly educated teachers, and school practices, being…
Descriptors: Gender Differences, Reading Achievement, Institutional Characteristics, Educational Resources
Santelices, Maria Veronica; Valencia, Edgar; Gonzalez, Jorge; Taut, Sandy – Educational Assessment, Evaluation and Accountability, 2017
This research examines empirically the relationship between two measures of teacher quality: one based on professional standards and a second one using teacher value-added estimates. It also studies the extent to which teacher observable characteristics, such as teacher training variables, are associated to better performance on either of these…
Descriptors: Teacher Effectiveness, Context Effect, Foreign Countries, Value Added Models
Dülmer, Hermann – Sociological Methods & Research, 2016
The factorial survey is an experimental design consisting of varying situations (vignettes) that have to be judged by respondents. For more complex research questions, it quickly becomes impossible for an individual respondent to judge all vignettes. To overcome this problem, random designs are recommended most of the time, whereas quota designs…
Descriptors: Factor Analysis, Reliability, Validity, Benchmarking