NotesFAQContact Us
Collection
Advanced
Search Tips
Laws, Policies, & Programs
Elementary and Secondary…1
What Works Clearinghouse Rating
Showing 1 to 15 of 218 results Save | Export
Peer reviewed Peer reviewed
Direct linkDirect link
Richard S. Balkin; Quentin Hunter; Bradley T. Erford – Measurement and Evaluation in Counseling and Development, 2024
We describe best practices in reporting reliability estimates in counseling research with consideration to precision, generalization, and diverse populations. We provide a historical context to reporting reliability estimates, the limitations of past practices, and new methods to address reliability generalization. We highlight best practices…
Descriptors: Best Practices, Reliability, Counseling, Research
Peer reviewed Peer reviewed
Direct linkDirect link
Wendy Chan; Jimin Oh; Chen Li; Jiexuan Huang; Yeran Tong – Society for Research on Educational Effectiveness, 2023
Background: The generalizability of a study's results continues to be at the forefront of concerns in evaluation research in education (Tipton & Olsen, 2018). Over the past decade, statisticians have developed methods, mainly based on propensity scores, to improve generalizations in the absence of random sampling (Stuart et al., 2011; Tipton,…
Descriptors: Generalizability Theory, Probability, Scores, Sampling
Peer reviewed Peer reviewed
Direct linkDirect link
van der Linden, Wim J. – Journal of Educational and Behavioral Statistics, 2022
The current literature on test equating generally defines it as the process necessary to obtain score comparability between different test forms. The definition is in contrast with Lord's foundational paper which viewed equating as the process required to obtain comparability of measurement scale between forms. The distinction between the notions…
Descriptors: Equated Scores, Test Items, Scores, Probability
Peer reviewed Peer reviewed
Direct linkDirect link
Franco-Martínez, Alicia; Alvarado, Jesús M.; Sorrel, Miguel A. – Educational and Psychological Measurement, 2023
A sample suffers range restriction (RR) when its variance is reduced comparing with its population variance and, in turn, it fails representing such population. If the RR occurs over the latent factor, not directly over the observed variable, the researcher deals with an indirect RR, common when using convenience samples. This work explores how…
Descriptors: Factor Analysis, Factor Structure, Scores, Sampling
Peer reviewed Peer reviewed
Direct linkDirect link
Chan, Wendy – American Journal of Evaluation, 2022
Over the past ten years, propensity score methods have made an important contribution to improving generalizations from studies that do not select samples randomly from a population of inference. However, these methods require assumptions and recent work has considered the role of bounding approaches that provide a range of treatment impact…
Descriptors: Probability, Scores, Scoring, Generalization
Peer reviewed Peer reviewed
Direct linkDirect link
John R. Donoghue; Carol Eckerly – Applied Measurement in Education, 2024
Trend scoring constructed response items (i.e. rescoring Time A responses at Time B) gives rise to two-way data that follow a product multinomial distribution rather than the multinomial distribution that is usually assumed. Recent work has shown that the difference in sampling model can have profound negative effects on statistics usually used to…
Descriptors: Scoring, Error of Measurement, Reliability, Scoring Rubrics
Peer reviewed Peer reviewed
PDF on ERIC Download full text
Donoghue, John R.; McClellan, Catherine A.; Hess, Melinda R. – ETS Research Report Series, 2022
When constructed-response items are administered for a second time, it is necessary to evaluate whether the current Time B administration's raters have drifted from the scoring of the original administration at Time A. To study this, Time A papers are sampled and rescored by Time B scorers. Commonly the scores are compared using the proportion of…
Descriptors: Item Response Theory, Test Construction, Scoring, Testing
Peer reviewed Peer reviewed
Direct linkDirect link
Pere J. Ferrando; David Navarro-González; Fabia Morales-Vives – Educational and Psychological Measurement, 2025
The problem of local item dependencies (LIDs) is very common in personality and attitude measures, particularly in those that measure narrow-bandwidth dimensions. At the structural level, these dependencies can be modeled by using extended factor analytic (FA) solutions that include correlated residuals. However, the effects that LIDs have on the…
Descriptors: Scores, Accuracy, Evaluation Methods, Factor Analysis
Peer reviewed Peer reviewed
PDF on ERIC Download full text
Özmen, Zeynep Medine; Güven, Bülent – Journal of Pedagogical Research, 2022
The present study aimed to remediate pre-service teachers' misconceptions about sampling distributions and to develop their conceptual understanding through the use of conceptual change texts (CCTs). The participants consisted of 84 pre-service teachers. To determine the pre-service teachers' conceptual understanding of sampling distributions, an…
Descriptors: Preservice Teachers, Mathematics Teachers, Sampling, Statistical Distributions
Peer reviewed Peer reviewed
Direct linkDirect link
Joo, Sean; Ali, Usama; Robin, Frederic; Shin, Hyo Jeong – Large-scale Assessments in Education, 2022
We investigated the potential impact of differential item functioning (DIF) on group-level mean and standard deviation estimates using empirical and simulated data in the context of large-scale assessment. For the empirical investigation, PISA 2018 cognitive domains (Reading, Mathematics, and Science) data were analyzed using Jackknife sampling to…
Descriptors: Test Items, Item Response Theory, Scores, Student Evaluation
Peer reviewed Peer reviewed
PDF on ERIC Download full text
Yao, Lili; Haberman, Shelby; McCaffrey, Daniel F.; Lockwood, J. R. – ETS Research Report Series, 2020
Minimum discriminant information adjustment (MDIA), an approach to weighting samples to conform to known population information, provides a generalization of raking and poststratification. In the case of simple random sampling with replacement with uniform sampling weights, large-sample properties are available for MDIA estimates of population…
Descriptors: Discriminant Analysis, Sampling, Sample Size, Scores
Peer reviewed Peer reviewed
Direct linkDirect link
Ji-Eun Lee; Amisha Jindal; Sanika Nitin Patki; Ashish Gurung; Reilly Norum; Erin Ottmar – Interactive Learning Environments, 2024
This paper demonstrated how to apply Machine Learning (ML) techniques to analyze student interaction data collected in an online mathematics game. Using a data-driven approach, we examined 1) how different ML algorithms influenced the precision of middle-school students' (N = 359) performance (i.e. posttest math knowledge scores) prediction and 2)…
Descriptors: Teaching Methods, Algorithms, Mathematics Tests, Computer Games
Peer reviewed Peer reviewed
Direct linkDirect link
Kopp, Jason P.; Jones, Andrew T. – Applied Measurement in Education, 2020
Traditional psychometric guidelines suggest that at least several hundred respondents are needed to obtain accurate parameter estimates under the Rasch model. However, recent research indicates that Rasch equating results in accurate parameter estimates with sample sizes as small as 25. Item parameter drift under the Rasch model has been…
Descriptors: Item Response Theory, Psychometrics, Sample Size, Sampling
Ji-Eun Lee; Amisha Jindal; Sanika Nitin Patki; Ashish Gurung; Reilly Norum; Erin Ottmar – Grantee Submission, 2023
This paper demonstrated how to apply Machine Learning (ML) techniques to analyze student interaction data collected in an online mathematics game. Using a data-driven approach, we examined: (1) how different ML algorithms influenced the precision of middle-school students' (N = 359) performance (i.e. posttest math knowledge scores) prediction; and…
Descriptors: Teaching Methods, Algorithms, Mathematics Tests, Computer Games
Peer reviewed Peer reviewed
PDF on ERIC Download full text
Jewsbury, Paul A. – ETS Research Report Series, 2019
When an assessment undergoes changes to the administration or instrument, bridge studies are typically used to try to ensure comparability of scores before and after the change. Among the most common and powerful is the common population linking design, with the use of a linear transformation to link scores to the metric of the original…
Descriptors: Evaluation Research, Scores, Error Patterns, Error of Measurement
Previous Page | Next Page »
Pages: 1  |  2  |  3  |  4  |  5  |  6  |  7  |  8  |  9  |  10  |  11  |  ...  |  15