Publication Date
In 2025 | 0 |
Since 2024 | 0 |
Since 2021 (last 5 years) | 3 |
Since 2016 (last 10 years) | 8 |
Since 2006 (last 20 years) | 12 |
Descriptor
Scaling | 15 |
Item Response Theory | 7 |
Statistical Analysis | 6 |
Achievement Tests | 5 |
Foreign Countries | 4 |
Models | 4 |
Error of Measurement | 3 |
Mathematics Achievement | 3 |
National Competency Tests | 3 |
Scores | 3 |
Test Items | 3 |
More ▼ |
Source
Journal of Educational and… | 15 |
Author
Mariano, Louis T. | 2 |
Algina, James | 1 |
Briggs, Derek C. | 1 |
Camilli, Gregory | 1 |
Domingue, Ben | 1 |
Doran, Harold | 1 |
Greenhouse, Joel B. | 1 |
Ho, Andrew Dean | 1 |
Jiang, Yu | 1 |
Johnson, Matthew S. | 1 |
Junker, Brian W. | 1 |
More ▼ |
Publication Type
Journal Articles | 15 |
Reports - Descriptive | 7 |
Reports - Research | 5 |
Reports - Evaluative | 3 |
Opinion Papers | 1 |
Speeches/Meeting Papers | 1 |
Education Level
Elementary Education | 2 |
Grade 4 | 2 |
Grade 8 | 2 |
Secondary Education | 2 |
Elementary Secondary Education | 1 |
Grade 3 | 1 |
Grade 5 | 1 |
Grade 6 | 1 |
Grade 7 | 1 |
Higher Education | 1 |
Intermediate Grades | 1 |
More ▼ |
Audience
Location
Austria (Vienna) | 1 |
China | 1 |
Germany | 1 |
Laws, Policies, & Programs
Assessments and Surveys
National Assessment of… | 2 |
Program for International… | 1 |
Trends in International… | 1 |
What Works Clearinghouse Rating
Doran, Harold – Journal of Educational and Behavioral Statistics, 2023
This article is concerned with a subset of numerically stable and scalable algorithms useful to support computationally complex psychometric models in the era of machine learning and massive data. The subset selected here is a core set of numerical methods that should be familiar to computational psychometricians and considers whitening transforms…
Descriptors: Scaling, Algorithms, Psychometrics, Computation
Robitzsch, Alexander; Lüdtke, Oliver – Journal of Educational and Behavioral Statistics, 2022
One of the primary goals of international large-scale assessments in education is the comparison of country means in student achievement. This article introduces a framework for discussing differential item functioning (DIF) for such mean comparisons. We compare three different linking methods: concurrent scaling based on full invariance,…
Descriptors: Test Bias, International Assessment, Scaling, Comparative Analysis
Lubbe, Dirk; Schuster, Christof – Journal of Educational and Behavioral Statistics, 2020
Extreme response style is the tendency of individuals to prefer the extreme categories of a rating scale irrespective of item content. It has been shown repeatedly that individual response style differences affect the reliability and validity of item responses and should, therefore, be considered carefully. To account for extreme response style…
Descriptors: Response Style (Tests), Rating Scales, Item Response Theory, Models
Suk, Youmi; Steiner, Peter M.; Kim, Jee-Seon; Kang, Hyunseung – Journal of Educational and Behavioral Statistics, 2022
Regression discontinuity (RD) designs are commonly used for program evaluation with continuous treatment assignment variables. But in practice, treatment assignment is frequently based on ordinal variables. In this study, we propose an RD design with an ordinal running variable to assess the effects of extended time accommodations (ETA) for…
Descriptors: Regression (Statistics), Program Evaluation, Research Design, English Language Learners
Ho, Andrew Dean – Journal of Educational and Behavioral Statistics, 2016
in this article, Andrew Dean Ho presents a response to David Thissen's essay, "Bad Questions: An Essay Involving Item Response Theory (2016)," calling it an excellent contribution to the genre of commentaries on the field which joins the likes of the piece by Thissen's frequent collaborator, Howard Wainer (2010), who published "14…
Descriptors: Item Response Theory, Statistics, Psychometrics, Goodness of Fit
Jiang, Yu; Zhang, Jiahui; Xin, Tao – Journal of Educational and Behavioral Statistics, 2019
This article is an overview of the National Assessment of Education Quality (NAEQ) of China in reading, mathematics, sciences, arts, physical education, and moral education at Grades 4 and 8. After a review of the background and history of NAEQ, we present the assessment framework with students' holistic development at the core and the design for…
Descriptors: Foreign Countries, Educational Quality, Educational Improvement, National Competency Tests
Martin, Michael O.; Mullis, Ina V. S. – Journal of Educational and Behavioral Statistics, 2019
International large-scale assessments of student achievement such as International Association for the Evaluation of Educational Achievement's Trends in International Mathematics and Science Study (TIMSS) and Progress in International Reading Literacy Study and Organization for Economic Cooperation and Development's Program for International…
Descriptors: Achievement Tests, International Assessment, Mathematics Tests, Science Achievement
VanHoudnos, Nathan M.; Greenhouse, Joel B. – Journal of Educational and Behavioral Statistics, 2016
When cluster randomized experiments are analyzed as if units were independent, test statistics for treatment effects can be anticonservative. Hedges proposed a correction for such tests by scaling them to control their Type I error rate. This article generalizes the Hedges correction from a posttest-only experimental design to more common designs…
Descriptors: Statistical Analysis, Randomized Controlled Trials, Error of Measurement, Scaling
Strobl, Carolin; Wickelmaier, Florian; Zeileis, Achim – Journal of Educational and Behavioral Statistics, 2011
The preference scaling of a group of subjects may not be homogeneous, but different groups of subjects with certain characteristics may show different preference scalings, each of which can be derived from paired comparisons by means of the Bradley-Terry model. Usually, either different models are fit in predefined subsets of the sample or the…
Descriptors: Individual Differences, Scaling, Statistical Analysis, Models
Briggs, Derek C.; Domingue, Ben – Journal of Educational and Behavioral Statistics, 2013
It is often assumed that a vertical scale is necessary when value-added models depend upon the gain scores of students across two or more points in time. This article examines the conditions under which the scale transformations associated with the vertical scaling process would be expected to have a significant impact on normative interpretations…
Descriptors: Evaluation Methods, Scaling, Scores, Achievement Tests
Mariano, Louis T.; McCaffrey, Daniel F.; Lockwood, J. R. – Journal of Educational and Behavioral Statistics, 2010
There is an increasing interest in using longitudinal measures of student achievement to estimate individual teacher effects. Current multivariate models assume each teacher has a single effect on student outcomes that persists undiminished to all future test administrations (complete persistence [CP]) or can diminish with time but remains…
Descriptors: Persistence, Academic Achievement, Data Analysis, Teacher Influence

Camilli, Gregory – Journal of Educational and Behavioral Statistics, 1994
Describes the scaling constant "d" = 1.702, used in Item Response Theory, which minimizes the maximum difference between the normal and logistic distribution functions. Recapitulates the theoretical and numerical derivation of "d" given by D. Haley (1952). (SLD)
Descriptors: Item Response Theory, Scaling

Patz, Richard J.; Junker, Brian W.; Johnson, Matthew S.; Mariano, Louis T. – Journal of Educational and Behavioral Statistics, 2002
Discusses the hierarchical rater model (HRM) of R. Patz (1996) and shows how it can be used to scale examinees and items, model aspects of consensus among raters, and model individual rater severity and consistency effects. Also shows how the HRM fits into the generalizability theory framework. Compares the HRM to the conventional item response…
Descriptors: Educational Assessment, Generalizability Theory, Item Response Theory, Scaling

Algina, James; And Others – Journal of Educational and Behavioral Statistics, 1995
A maximum test in which the test statistic is the more extreme of the Brown-Forsythe and in which O'Brien's test statistics are developed, with estimated Type I error rates and power for all three tests. For study conditions, Type I error rates for the maximum test are near the nominal level. (SLD)
Descriptors: Error of Measurement, Estimation (Mathematics), Power (Statistics), Scaling
May, Henry – Journal of Educational and Behavioral Statistics, 2006
In this article, a new method is presented and implemented for deriving a scale of socioeconomic status (SES) from international survey data using a multilevel Bayesian item response theory (IRT) model. The proposed model incorporates both international anchor items and nation-specific items and is able to (a) produce student family SES scores…
Descriptors: Item Response Theory, Bayesian Statistics, Socioeconomic Status, Scaling