NotesFAQContact Us
Collection
Advanced
Search Tips
Source
Journal of Educational and…15
Audience
Laws, Policies, & Programs
What Works Clearinghouse Rating
Showing all 15 results Save | Export
Peer reviewed Peer reviewed
Direct linkDirect link
Doran, Harold – Journal of Educational and Behavioral Statistics, 2023
This article is concerned with a subset of numerically stable and scalable algorithms useful to support computationally complex psychometric models in the era of machine learning and massive data. The subset selected here is a core set of numerical methods that should be familiar to computational psychometricians and considers whitening transforms…
Descriptors: Scaling, Algorithms, Psychometrics, Computation
Peer reviewed Peer reviewed
Direct linkDirect link
Robitzsch, Alexander; Lüdtke, Oliver – Journal of Educational and Behavioral Statistics, 2022
One of the primary goals of international large-scale assessments in education is the comparison of country means in student achievement. This article introduces a framework for discussing differential item functioning (DIF) for such mean comparisons. We compare three different linking methods: concurrent scaling based on full invariance,…
Descriptors: Test Bias, International Assessment, Scaling, Comparative Analysis
Peer reviewed Peer reviewed
Direct linkDirect link
Lubbe, Dirk; Schuster, Christof – Journal of Educational and Behavioral Statistics, 2020
Extreme response style is the tendency of individuals to prefer the extreme categories of a rating scale irrespective of item content. It has been shown repeatedly that individual response style differences affect the reliability and validity of item responses and should, therefore, be considered carefully. To account for extreme response style…
Descriptors: Response Style (Tests), Rating Scales, Item Response Theory, Models
Peer reviewed Peer reviewed
Direct linkDirect link
Suk, Youmi; Steiner, Peter M.; Kim, Jee-Seon; Kang, Hyunseung – Journal of Educational and Behavioral Statistics, 2022
Regression discontinuity (RD) designs are commonly used for program evaluation with continuous treatment assignment variables. But in practice, treatment assignment is frequently based on ordinal variables. In this study, we propose an RD design with an ordinal running variable to assess the effects of extended time accommodations (ETA) for…
Descriptors: Regression (Statistics), Program Evaluation, Research Design, English Language Learners
Peer reviewed Peer reviewed
Direct linkDirect link
Ho, Andrew Dean – Journal of Educational and Behavioral Statistics, 2016
in this article, Andrew Dean Ho presents a response to David Thissen's essay, "Bad Questions: An Essay Involving Item Response Theory (2016)," calling it an excellent contribution to the genre of commentaries on the field which joins the likes of the piece by Thissen's frequent collaborator, Howard Wainer (2010), who published "14…
Descriptors: Item Response Theory, Statistics, Psychometrics, Goodness of Fit
Peer reviewed Peer reviewed
Direct linkDirect link
Jiang, Yu; Zhang, Jiahui; Xin, Tao – Journal of Educational and Behavioral Statistics, 2019
This article is an overview of the National Assessment of Education Quality (NAEQ) of China in reading, mathematics, sciences, arts, physical education, and moral education at Grades 4 and 8. After a review of the background and history of NAEQ, we present the assessment framework with students' holistic development at the core and the design for…
Descriptors: Foreign Countries, Educational Quality, Educational Improvement, National Competency Tests
Peer reviewed Peer reviewed
Direct linkDirect link
Martin, Michael O.; Mullis, Ina V. S. – Journal of Educational and Behavioral Statistics, 2019
International large-scale assessments of student achievement such as International Association for the Evaluation of Educational Achievement's Trends in International Mathematics and Science Study (TIMSS) and Progress in International Reading Literacy Study and Organization for Economic Cooperation and Development's Program for International…
Descriptors: Achievement Tests, International Assessment, Mathematics Tests, Science Achievement
Peer reviewed Peer reviewed
Direct linkDirect link
VanHoudnos, Nathan M.; Greenhouse, Joel B. – Journal of Educational and Behavioral Statistics, 2016
When cluster randomized experiments are analyzed as if units were independent, test statistics for treatment effects can be anticonservative. Hedges proposed a correction for such tests by scaling them to control their Type I error rate. This article generalizes the Hedges correction from a posttest-only experimental design to more common designs…
Descriptors: Statistical Analysis, Randomized Controlled Trials, Error of Measurement, Scaling
Peer reviewed Peer reviewed
Direct linkDirect link
Strobl, Carolin; Wickelmaier, Florian; Zeileis, Achim – Journal of Educational and Behavioral Statistics, 2011
The preference scaling of a group of subjects may not be homogeneous, but different groups of subjects with certain characteristics may show different preference scalings, each of which can be derived from paired comparisons by means of the Bradley-Terry model. Usually, either different models are fit in predefined subsets of the sample or the…
Descriptors: Individual Differences, Scaling, Statistical Analysis, Models
Peer reviewed Peer reviewed
Direct linkDirect link
Briggs, Derek C.; Domingue, Ben – Journal of Educational and Behavioral Statistics, 2013
It is often assumed that a vertical scale is necessary when value-added models depend upon the gain scores of students across two or more points in time. This article examines the conditions under which the scale transformations associated with the vertical scaling process would be expected to have a significant impact on normative interpretations…
Descriptors: Evaluation Methods, Scaling, Scores, Achievement Tests
Peer reviewed Peer reviewed
Direct linkDirect link
Mariano, Louis T.; McCaffrey, Daniel F.; Lockwood, J. R. – Journal of Educational and Behavioral Statistics, 2010
There is an increasing interest in using longitudinal measures of student achievement to estimate individual teacher effects. Current multivariate models assume each teacher has a single effect on student outcomes that persists undiminished to all future test administrations (complete persistence [CP]) or can diminish with time but remains…
Descriptors: Persistence, Academic Achievement, Data Analysis, Teacher Influence
Peer reviewed Peer reviewed
Camilli, Gregory – Journal of Educational and Behavioral Statistics, 1994
Describes the scaling constant "d" = 1.702, used in Item Response Theory, which minimizes the maximum difference between the normal and logistic distribution functions. Recapitulates the theoretical and numerical derivation of "d" given by D. Haley (1952). (SLD)
Descriptors: Item Response Theory, Scaling
Peer reviewed Peer reviewed
Patz, Richard J.; Junker, Brian W.; Johnson, Matthew S.; Mariano, Louis T. – Journal of Educational and Behavioral Statistics, 2002
Discusses the hierarchical rater model (HRM) of R. Patz (1996) and shows how it can be used to scale examinees and items, model aspects of consensus among raters, and model individual rater severity and consistency effects. Also shows how the HRM fits into the generalizability theory framework. Compares the HRM to the conventional item response…
Descriptors: Educational Assessment, Generalizability Theory, Item Response Theory, Scaling
Peer reviewed Peer reviewed
Algina, James; And Others – Journal of Educational and Behavioral Statistics, 1995
A maximum test in which the test statistic is the more extreme of the Brown-Forsythe and in which O'Brien's test statistics are developed, with estimated Type I error rates and power for all three tests. For study conditions, Type I error rates for the maximum test are near the nominal level. (SLD)
Descriptors: Error of Measurement, Estimation (Mathematics), Power (Statistics), Scaling
Peer reviewed Peer reviewed
Direct linkDirect link
May, Henry – Journal of Educational and Behavioral Statistics, 2006
In this article, a new method is presented and implemented for deriving a scale of socioeconomic status (SES) from international survey data using a multilevel Bayesian item response theory (IRT) model. The proposed model incorporates both international anchor items and nation-specific items and is able to (a) produce student family SES scores…
Descriptors: Item Response Theory, Bayesian Statistics, Socioeconomic Status, Scaling