Publication Date
In 2025 | 0 |
Since 2024 | 0 |
Since 2021 (last 5 years) | 2 |
Since 2016 (last 10 years) | 5 |
Since 2006 (last 20 years) | 8 |
Descriptor
Source
Author
Beaton, Albert E. | 7 |
Johnson, Eugene G. | 5 |
Mislevy, Robert J. | 3 |
Allen, Nancy L. | 2 |
Kolen, Michael J. | 2 |
Mazzeo, John | 2 |
Zwick, Rebecca | 2 |
Abedi, Jamal | 1 |
Almond, Russell G. | 1 |
Anderson, Ronald E. | 1 |
Baker, Eva L. | 1 |
More ▼ |
Publication Type
Education Level
Elementary Education | 3 |
Elementary Secondary Education | 3 |
Grade 8 | 3 |
Junior High Schools | 3 |
Middle Schools | 3 |
Secondary Education | 3 |
Grade 4 | 2 |
Intermediate Grades | 2 |
Audience
Researchers | 6 |
Policymakers | 2 |
Practitioners | 1 |
Location
United States | 1 |
Laws, Policies, & Programs
No Child Left Behind Act 2001 | 1 |
Assessments and Surveys
National Assessment of… | 50 |
Iowa Tests of Basic Skills | 1 |
Iowa Tests of Educational… | 1 |
National Adult Literacy… | 1 |
Sequential Tests of… | 1 |
Trends in International… | 1 |
What Works Clearinghouse Rating
Suk, Youmi; Steiner, Peter M.; Kim, Jee-Seon; Kang, Hyunseung – Journal of Educational and Behavioral Statistics, 2022
Regression discontinuity (RD) designs are commonly used for program evaluation with continuous treatment assignment variables. But in practice, treatment assignment is frequently based on ordinal variables. In this study, we propose an RD design with an ordinal running variable to assess the effects of extended time accommodations (ETA) for…
Descriptors: Regression (Statistics), Program Evaluation, Research Design, English Language Learners
Reardon, Sean F.; Ho, Andrew D.; Kalogrides, Demetra – Stanford Center for Education Policy Analysis, 2019
Linking score scales across different tests is considered speculative and fraught, even at the aggregate level (Feuer et al., 1999; Thissen, 2007). We introduce and illustrate validation methods for aggregate linkages, using the challenge of linking U.S. school district average test scores across states as a motivating example. We show that…
Descriptors: Test Validity, Evaluation Methods, School Districts, Scores
Youmi Suk; Peter M. Steiner; Jee-Seon Kim; Hyunseung Kang – Society for Research on Educational Effectiveness, 2021
Background/Context: Regression discontinuity (RD) designs are used for policy and program evaluation where subjects' eligibility into a program or policy is determined by whether an assignment variable (i.e., running variable) exceeds a pre-defined cutoff. Under a standard RD design with a continuous assignment variable, the average treatment…
Descriptors: Educational Policy, Eligibility, Cutting Scores, Testing Accommodations
Martin, Michael O.; Mullis, Ina V. S. – Journal of Educational and Behavioral Statistics, 2019
International large-scale assessments of student achievement such as International Association for the Evaluation of Educational Achievement's Trends in International Mathematics and Science Study (TIMSS) and Progress in International Reading Literacy Study and Organization for Economic Cooperation and Development's Program for International…
Descriptors: Achievement Tests, International Assessment, Mathematics Tests, Science Achievement
Jacob, Brian; Rothstein, Jesse – National Bureau of Economic Research, 2016
Economists often use test scores to measure a student's performance or an adult's human capital. These scores reflect non-trivial decisions about how to measure and scale student achievement, with important implications for secondary analyses. For example, the scores computed in several major testing regimes, including the National Assessment of…
Descriptors: Measurement, Academic Ability, Educational Assessment, Scores
Almond, Russell G.; Sinharay, Sandip – ETS Research Report Series, 2012
To answer questions about how students' proficiencies are changing over time, educational researchers are looking for data sources that span many years. Clearly, for answering questions about student growth, a longitudinal study--in which a single sample is followed over many years--is preferable to repeated cross-sectional samples--in which a…
Descriptors: Educational Research, Case Studies, Research Methodology, Literature Reviews
Tong, Ye; Kolen, Michael J. – Educational Measurement: Issues and Practice, 2010
"Scaling" is the process of constructing a score scale that associates numbers or other ordered indicators with the performance of examinees. Scaling typically is conducted to aid users in interpreting test results. This module describes different types of raw scores and scale scores, illustrates how to incorporate various sources of…
Descriptors: Test Results, Scaling, Measures (Individuals), Raw Scores
Somers, Marie-Andree; Zhu, Pei; Wong, Edmond – National Center for Education Evaluation and Regional Assistance, 2011
This study examines the practical implications of using state tests to measure student achievement in impact evaluations that span multiple states and grades. In particular, the study examines the sensitivity of impact findings to (1) the type of assessment used to measured achievement (state tests or an external assessment administered by the…
Descriptors: Evaluators, Grades (Scholastic), Academic Achievement, Program Effectiveness

Beaton, Albert E.; Johnson, Eugene G. – Journal of Educational Statistics, 1990
The average response method (ARM) of scaling nonbinary data was developed to scale data from the assessments of writing conducted by the National Assessment of Educational Progress (NAEP). The method is described and illustrated with data from the 1983-84 NAEP. (SLD)
Descriptors: Elementary Secondary Education, Equations (Mathematics), Mathematical Models, Scaling
Haertel, Edward H. – 1991
The National Assessment Governing Board of Educational Progress has recently adopted the position that the National Assessment of Educational Progress (NAEP) should employ within-age scaling whenever feasible. The NAEP Technical Review panel (TRP) has studied the issue at some length, and reports on it in this analysis. The first section reviews…
Descriptors: Academic Achievement, Elementary Secondary Education, National Surveys, Psychometrics
Sheehan, Kathleen M.; Mislevy, Robert J. – 1988
In many practical applications of item response theory, the parameters of overlapping subsets of test items are estimated from different samples of examinees. A linking procedure is then employed to place the resulting item parameter estimates onto a common scale. It is standard practice to ignore the uncertainty associated with the linking step…
Descriptors: Error of Measurement, Estimation (Mathematics), Item Response Theory, Measurement Techniques
Kolstad, Andrew – 1996
The role of the response probability convention in reporting results from the 1992 National Adult Literacy Survey is explored, using interviews with more than 26,000 adults and young adults. In order to summarize what respondents of a particular proficiency can do, it is convenient to adopt a convention for a sufficient response probability that…
Descriptors: Adults, Interviews, Item Response Theory, Literacy
Feuer, Michael J., Ed.; Holland, Paul W., Ed.; Green, Bert F., Ed.; Bertenthal, Meryl W., Ed.; Hemphill, F. Cadelle, Ed. – 1999
A study was conducted of the feasibility of establishing an equivalency scale that would enable commercial state tests to be linked to one another and to the National Assessment of Educational Progress (NAEP). In evaluating the feasibility of linkages, the study committee focused on the linkage of various fourth-grade reading tests and the linkage…
Descriptors: Achievement Tests, Comparative Analysis, Elementary Secondary Education, Equated Scores

Camilli, Gregory; And Others – Applied Psychological Measurement, 1993
Three potential causes of scale shrinkage (measurement error, restriction of range, and multidimensionality) in item response theory vertical equating are discussed, and a more comprehensive model-based approach to establishing vertical scales is described. Test data from the National Assessment of Educational Progress are used to illustrate the…
Descriptors: Equated Scores, Error of Measurement, Item Response Theory, Maximum Likelihood Statistics
Beaton, Albert E.; Johnson, Eugene G. – 1990
When the Educational Testing Service became the administrator of the National Assessment of Educational Progress (NAEP) in 1983, it introduced scales based on item response theory (IRT) as a way of presenting results of the assessment to the general public. Some properties of the scales and their uses are discussed. Initial attempts at presenting…
Descriptors: Academic Achievement, Data Interpretation, Educational Assessment, Elementary Secondary Education