Publication Date
In 2025 | 0 |
Since 2024 | 0 |
Since 2021 (last 5 years) | 2 |
Since 2016 (last 10 years) | 4 |
Since 2006 (last 20 years) | 11 |
Descriptor
Source
Journal of Educational and… | 15 |
Author
Sinharay, Sandip | 3 |
Bennink, Margot | 1 |
Boyd, Donald | 1 |
Croon, Marcel A. | 1 |
Haberman, Shelby J. | 1 |
Hamilton, Laura | 1 |
Ho, Andrew D. | 1 |
Jeon, Minjeong | 1 |
Johnson, Matthew S. | 1 |
Kalogrides, Demetra | 1 |
Keuning, Jos | 1 |
More ▼ |
Publication Type
Journal Articles | 15 |
Reports - Research | 7 |
Reports - Evaluative | 5 |
Reports - Descriptive | 2 |
Opinion Papers | 1 |
Tests/Questionnaires | 1 |
Education Level
Grade 4 | 2 |
Grade 8 | 2 |
Junior High Schools | 2 |
Middle Schools | 2 |
Secondary Education | 2 |
Elementary Education | 1 |
Grade 3 | 1 |
Grade 5 | 1 |
Grade 6 | 1 |
Grade 7 | 1 |
Grade 9 | 1 |
More ▼ |
Audience
Location
Netherlands | 1 |
New York | 1 |
Laws, Policies, & Programs
Assessments and Surveys
Iowa Tests of Basic Skills | 1 |
Measures of Academic Progress | 1 |
National Assessment of… | 1 |
Program for International… | 1 |
Trends in International… | 1 |
What Works Clearinghouse Rating
Sinharay, Sandip – Journal of Educational and Behavioral Statistics, 2022
Takers of educational tests often receive proficiency levels instead of or in addition to scaled scores. For example, proficiency levels are reported for the Advanced Placement (APĀ®) and U.S. Medical Licensing examinations. Technical difficulties and other unforeseen events occasionally lead to missing item scores and hence to incomplete data on…
Descriptors: Computation, Data Analysis, Educational Testing, Accuracy
Sinharay, Sandip; van Rijn, Peter W. – Journal of Educational and Behavioral Statistics, 2020
Response time models (RTMs) are of increasing interest in educational and psychological testing. This article focuses on the lognormal model for response times, which is one of the most popular RTMs. Several existing statistics for testing normality and the fit of factor analysis models are repurposed for testing the fit of the lognormal model. A…
Descriptors: Educational Testing, Psychological Testing, Goodness of Fit, Factor Analysis
Liu, Yang; Wang, Xiaojing – Journal of Educational and Behavioral Statistics, 2020
Parametric methods, such as autoregressive models or latent growth modeling, are usually inflexible to model the dependence and nonlinear effects among the changes of latent traits whenever the time gap is irregular and the recorded time points are individually varying. Often in practice, the growth trend of latent traits is subject to certain…
Descriptors: Bayesian Statistics, Nonparametric Statistics, Regression (Statistics), Item Response Theory
Reardon, Sean F.; Kalogrides, Demetra; Ho, Andrew D. – Journal of Educational and Behavioral Statistics, 2021
Linking score scales across different tests is considered speculative and fraught, even at the aggregate level. We introduce and illustrate validation methods for aggregate linkages, using the challenge of linking U.S. school district average test scores across states as a motivating example. We show that aggregate linkages can be validated both…
Descriptors: Equated Scores, Validity, Methods, School Districts
van der Linden, Wim J.; Jeon, Minjeong – Journal of Educational and Behavioral Statistics, 2012
The probability of test takers changing answers upon review of their initial choices is modeled. The primary purpose of the model is to check erasures on answer sheets recorded by an optical scanner for numbers and patterns that may be indicative of irregular behavior, such as teachers or school administrators changing answer sheets after their…
Descriptors: Probability, Models, Test Items, Educational Testing
Bennink, Margot; Croon, Marcel A.; Keuning, Jos; Vermunt, Jeroen K. – Journal of Educational and Behavioral Statistics, 2014
In educational measurement, responses of students on items are used not only to measure the ability of students, but also to evaluate and compare the performance of schools. Analysis should ideally account for the multilevel structure of the data, and school-level processes not related to ability, such as working climate and administration…
Descriptors: Academic Ability, Educational Assessment, Educational Testing, Test Bias
Boyd, Donald; Lankford, Hamilton; Loeb, Susanna; Wyckoff, James – Journal of Educational and Behavioral Statistics, 2013
Test-based accountability as well as value-added asessments and much experimental and quasi-experimental research in education rely on achievement tests to measure student skills and knowledge. Yet, we know little regarding fundamental properties of these tests, an important example being the extent of measurement error and its implications for…
Descriptors: Accountability, Educational Research, Educational Testing, Error of Measurement
Haberman, Shelby J. – Journal of Educational and Behavioral Statistics, 2008
In educational tests, subscores are often generated from a portion of the items in a larger test. Guidelines based on mean squared error are proposed to indicate whether subscores are worth reporting. Alternatives considered are direct reports of subscores, estimates of subscores based on total score, combined estimates based on subscores and…
Descriptors: Testing Programs, Regression (Statistics), Scores, Student Evaluation
Livingston, Samuel A. – Journal of Educational and Behavioral Statistics, 2006
This article suggests a graphic technique that uses P-P plots to show the extent to which two groups differ on two variables. It can be used even if the variables are measured in completely different, noncomparable units. The comparison is symmetric with respect to the variables and the groups. It reflects the differences between the groups over…
Descriptors: Comparative Analysis, Groups, Differences, Graphs
Liu, Yuming; Schulz, E. Matthew; Yu, Lei – Journal of Educational and Behavioral Statistics, 2008
A Markov chain Monte Carlo (MCMC) method and a bootstrap method were compared in the estimation of standard errors of item response theory (IRT) true score equating. Three test form relationships were examined: parallel, tau-equivalent, and congeneric. Data were simulated based on Reading Comprehension and Vocabulary tests of the Iowa Tests of…
Descriptors: Reading Comprehension, Test Format, Markov Processes, Educational Testing
Lewis, Charles – Journal of Educational and Behavioral Statistics, 2006
In the context of reviewing an article for this journal (van der Linden & Sotaridona, this issue, pp. 283-304) the topic of unconditional and conditional hypothesis testing came under consideration. While this is hardly a new issue (consider, for example, arguments regarding the chi square vs. Fisher exact test of independence for a 2 x 2…
Descriptors: Hypothesis Testing, Educational Testing, Item Response Theory, Research Problems
Sinharay, Sandip; Johnson, Matthew S.; Williamson, David M. – Journal of Educational and Behavioral Statistics, 2003
Item families, which are groups of related items, are becoming increasingly popular in complex educational assessments. For example, in automatic item generation (AIG) systems, a test may consist of multiple items generated from each of a number of item models. Item calibration or scoring for such an assessment requires fitting models that can…
Descriptors: Test Items, Markov Processes, Educational Testing, Probability

Longford, N. T. – Journal of Educational and Behavioral Statistics, 1994
Presents a model-based approach to rater reliability for essays read by multiple raters. The approach is motivated by generalizability theory, and variation of rater severity and rater inconsistency is considered in the presence of between-examinee variations. Illustrates methods with data from standardized educational tests. (Author/SLD)
Descriptors: Educational Testing, Essay Tests, Generalizability Theory, Interrater Reliability
McCaffrey, Daniel F.; Lockwood, J. R.; Koretz, Daniel; Louis, Thomas A.; Hamilton, Laura – Journal of Educational and Behavioral Statistics, 2004
The insightful discussions by Raudenbush, Rubin, Stuart and Zanutto (RSZ) and Reckase identify important challenges for interpreting the output of VAM and for its use with test-based accountability. As these authors note, VAM are statistical models for the correlations among scores from students who share common teachers or schools during the…
Descriptors: Educational Testing, Accountability, Mathematical Models, Teacher Influence
Journal of Educational and Behavioral Statistics, 2003
Lyle V. Jones served as director of the Thurstone Psychometric Laboratory and also became the Vice Chancellor and Dean of the Graduate School of the University of North Carolina (UNC). Jones has been a Research Professor at UNC since 1992. This article presents an interview with Jones wherein he talked about his career as a researcher. Jones also…
Descriptors: National Competency Tests, Laboratories, Psychometrics, Profiles