Publication Date
In 2025 | 1 |
Since 2024 | 2 |
Since 2021 (last 5 years) | 5 |
Since 2016 (last 10 years) | 14 |
Since 2006 (last 20 years) | 33 |
Descriptor
Source
Author
Johnson, Eugene G. | 3 |
Beaton, Albert E. | 2 |
Chan, Wendy | 2 |
Donovan, Jenny | 2 |
Heffernan, Neil T. | 2 |
Horkay, Nancy, Ed. | 2 |
Lennon, Melissa | 2 |
Martin, Michael O., Ed. | 2 |
Ostrow, Korinn S. | 2 |
Wang, Yan | 2 |
Abdekhodaie, Zahra | 1 |
More ▼ |
Publication Type
Education Level
Elementary Education | 8 |
Elementary Secondary Education | 6 |
Secondary Education | 5 |
Grade 4 | 4 |
Intermediate Grades | 4 |
Grade 6 | 2 |
Grade 8 | 2 |
Higher Education | 2 |
Junior High Schools | 2 |
Middle Schools | 2 |
Postsecondary Education | 2 |
More ▼ |
Audience
Researchers | 6 |
Practitioners | 2 |
Teachers | 1 |
Location
Australia | 5 |
Norway | 3 |
South Korea | 3 |
United States | 3 |
Belgium | 2 |
Canada | 2 |
Chile | 2 |
Czech Republic | 2 |
Denmark | 2 |
France | 2 |
Germany | 2 |
More ▼ |
Laws, Policies, & Programs
Elementary and Secondary… | 1 |
Assessments and Surveys
What Works Clearinghouse Rating
Chan, Wendy – American Journal of Evaluation, 2022
Over the past ten years, propensity score methods have made an important contribution to improving generalizations from studies that do not select samples randomly from a population of inference. However, these methods require assumptions and recent work has considered the role of bounding approaches that provide a range of treatment impact…
Descriptors: Probability, Scores, Scoring, Generalization
John R. Donoghue; Carol Eckerly – Applied Measurement in Education, 2024
Trend scoring constructed response items (i.e. rescoring Time A responses at Time B) gives rise to two-way data that follow a product multinomial distribution rather than the multinomial distribution that is usually assumed. Recent work has shown that the difference in sampling model can have profound negative effects on statistics usually used to…
Descriptors: Scoring, Error of Measurement, Reliability, Scoring Rubrics
Donoghue, John R.; McClellan, Catherine A.; Hess, Melinda R. – ETS Research Report Series, 2022
When constructed-response items are administered for a second time, it is necessary to evaluate whether the current Time B administration's raters have drifted from the scoring of the original administration at Time A. To study this, Time A papers are sampled and rescored by Time B scorers. Commonly the scores are compared using the proportion of…
Descriptors: Item Response Theory, Test Construction, Scoring, Testing
Leslie Rutkowski; David Rutkowski – Journal of Creative Behavior, 2025
The Programme for International Student Assessment (PISA) introduced creative thinking as an innovative domain in 2022. This paper examines the unique methodological issues in international assessments and the implications of measuring creative thinking within PISA's framework, including stratified sampling, rotated form designs, and a distinct…
Descriptors: Creativity, Creative Thinking, Measurement, Sampling
Jiang, Yu; Zhang, Jiahui; Xin, Tao – Journal of Educational and Behavioral Statistics, 2019
This article is an overview of the National Assessment of Education Quality (NAEQ) of China in reading, mathematics, sciences, arts, physical education, and moral education at Grades 4 and 8. After a review of the background and history of NAEQ, we present the assessment framework with students' holistic development at the core and the design for…
Descriptors: Foreign Countries, Educational Quality, Educational Improvement, National Competency Tests
Chan, Wendy – Journal of Educational and Behavioral Statistics, 2018
Policymakers have grown increasingly interested in how experimental results may generalize to a larger population. However, recently developed propensity score-based methods are limited by small sample sizes, where the experimental study is generalized to a population that is at least 20 times larger. This is particularly problematic for methods…
Descriptors: Computation, Generalization, Probability, Sample Size
Egan, Laura; Tang, Judy H.; Ferraro, David; Erberber, Ebru; Tsokodayi, Yemurai; Stearns, Pat – National Center for Education Statistics, 2022
Trends in International Mathematics and Science Study (TIMSS) is an international comparative study designed to measure trends in mathematics and science achievement at grades 4 and 8, as well as to collect information about educational contexts (such as students' schools, teachers, and homes) that may be related to student achievement. TIMSS has…
Descriptors: Achievement Tests, Mathematics Achievement, International Assessment, Foreign Countries
Ostrow, Korinn S.; Wang, Yan; Heffernan, Neil T. – Journal of Learning Analytics, 2017
Data is flexible in that it is molded by not only the features and variables available to a researcher for analysis and interpretation, but also by how those features and variables are recorded and processed prior to evaluation. "Big Data" from online learning platforms and intelligent tutoring systems is no different. The work presented…
Descriptors: Data, Comparative Analysis, Scoring, Mathematics Skills
Ostrow, Korinn S.; Wang, Yan; Heffernan, Neil T. – Grantee Submission, 2017
Data is flexible in that it is molded by not only the features and variables available to a researcher for analysis and interpretation, but also by how those features and variables are recorded and processed prior to evaluation. "Big Data" from online learning platforms and intelligent tutoring systems is no different. The work presented…
Descriptors: Data, Comparative Analysis, Scoring, Mathematics Skills
Johnson, Austin H.; Chafouleas, Sandra M.; Briesch, Amy M. – School Psychology Quarterly, 2017
In this study, generalizability theory was used to examine the extent to which (a) time-sampling methodology, (b) number of simultaneous behavior targets, and (c) individual raters influenced variance in ratings of academic engagement for an elementary-aged student. Ten graduate-student raters, with an average of 7.20 hr of previous training in…
Descriptors: Generalizability Theory, Sampling, Elementary School Students, Learner Engagement
Herget, Debbie; Dalton, Ben; Kinney, Saki; Smith, W. Zachary; Wilson, David; Rogers, Jim – National Center for Education Statistics, 2019
The Progress in International Reading Literacy Study (PIRLS) is an international comparative study of student performance in reading literacy at the fourth grade. PIRLS 2016 marks the fourth iteration of the study, which has been conducted every 5 years since 2001. New to the PIRLS assessment in 2016, ePIRLS provides a computer-based extension to…
Descriptors: Achievement Tests, Grade 4, Reading Achievement, Foreign Countries
Kern, Holger L.; Stuart, Elizabeth A.; Hill, Jennifer; Green, Donald P. – Journal of Research on Educational Effectiveness, 2016
Randomized experiments are considered the gold standard for causal inference because they can provide unbiased estimates of treatment effects for the experimental participants. However, researchers and policymakers are often interested in using a specific experiment to inform decisions about other target populations. In education research,…
Descriptors: Educational Research, Generalization, Sampling, Participant Characteristics
Puhan, Gautam – ETS Research Report Series, 2013
The purpose of this study was to demonstrate that the choice of sample weights when defining the target population under poststratification equating can be a critical factor in determining the accuracy of the equating results under a unique equating scenario, known as "rater comparability scoring and equating." The nature of data…
Descriptors: Scoring, Equated Scores, Sampling, Accuracy
Oliveri, María Elena; von Davier, Alina A. – International Journal of Testing, 2016
In this study, we propose that the unique needs and characteristics of linguistic minorities should be considered throughout the test development process. Unlike most measurement invariance investigations in the assessment of linguistic minorities, which typically are conducted after test administration, we propose strategies that focus on the…
Descriptors: Psychometrics, Linguistics, Test Construction, Testing
Zhang, Mo – ETS Research Report Series, 2013
Many testing programs use automated scoring to grade essays. One issue in automated essay scoring that has not been examined adequately is population invariance and its causes. The primary purpose of this study was to investigate the impact of sampling in model calibration on population invariance of automated scores. This study analyzed scores…
Descriptors: Automation, Scoring, Essay Tests, Sampling