ERIC - Search Results

Publication Date

In 2025	0
Since 2024	0
Since 2021 (last 5 years)	2
Since 2016 (last 10 years)	4
Since 2006 (last 20 years)	11

Descriptor

Educational Testing	15
Item Response Theory	5
Error of Measurement	4
Markov Processes	4
Scores	4
Bayesian Statistics	3
Comparative Analysis	3
Correlation	3
Educational Research	3
Longitudinal Studies	3
Monte Carlo Methods	3
Regression (Statistics)	3
Standardized Tests	3
Test Items	3
Accountability	2
Achievement Tests	2
Cheating	2
Computation	2
Generalizability Theory	2
Mathematics Tests	2
Models	2
National Competency Tests	2
Probability	2
Reading Tests	2
Statistical Analysis	2
More ▼

Source

Journal of Educational and…

Publication Type

Journal Articles	15
Reports - Research	7
Reports - Evaluative	5
Reports - Descriptive	2
Opinion Papers	1
Tests/Questionnaires	1

Education Level

Grade 4	2
Grade 8	2
Junior High Schools	2
Middle Schools	2
Secondary Education	2
Elementary Education	1
Grade 3	1
Grade 5	1
Grade 6	1
Grade 7	1
Grade 9	1
High Schools	1
Intermediate Grades	1
More ▼

Audience

Location

Netherlands	1
New York	1

Laws, Policies, & Programs

Assessments and Surveys

Iowa Tests of Basic Skills	1
Measures of Academic Progress	1
National Assessment of…	1
Program for International…	1
Trends in International…	1

What Works Clearinghouse Rating

Showing all 15 results Save | Export

Reporting Proficiency Levels for Examinees with Incomplete Data

Peer reviewed

Direct link

Sinharay, Sandip – Journal of Educational and Behavioral Statistics, 2022

Takers of educational tests often receive proficiency levels instead of or in addition to scaled scores. For example, proficiency levels are reported for the Advanced Placement (AP®) and U.S. Medical Licensing examinations. Technical difficulties and other unforeseen events occasionally lead to missing item scores and hence to incomplete data on…

Descriptors: Computation, Data Analysis, Educational Testing, Accuracy

Assessing Fit of the Lognormal Model for Response Times

Peer reviewed
PDF on ERIC

Download full text

Direct link

Sinharay, Sandip; van Rijn, Peter W. – Journal of Educational and Behavioral Statistics, 2020

Response time models (RTMs) are of increasing interest in educational and psychological testing. This article focuses on the lognormal model for response times, which is one of the most popular RTMs. Several existing statistics for testing normality and the fit of factor analysis models are repurposed for testing the fit of the lognormal model. A…

Descriptors: Educational Testing, Psychological Testing, Goodness of Fit, Factor Analysis

Bayesian Nonparametric Monotone Regression of Dynamic Latent Traits in Item Response Theory Models

Peer reviewed

Direct link

Liu, Yang; Wang, Xiaojing – Journal of Educational and Behavioral Statistics, 2020

Parametric methods, such as autoregressive models or latent growth modeling, are usually inflexible to model the dependence and nonlinear effects among the changes of latent traits whenever the time gap is irregular and the recorded time points are individually varying. Often in practice, the growth trend of latent traits is subject to certain…

Descriptors: Bayesian Statistics, Nonparametric Statistics, Regression (Statistics), Item Response Theory

Validation Methods for Aggregate-Level Test Scale Linking: A Case Study Mapping School District Test Score Distributions to a Common Scale

Peer reviewed
PDF on ERIC

Download full text

Direct link

Reardon, Sean F.; Kalogrides, Demetra; Ho, Andrew D. – Journal of Educational and Behavioral Statistics, 2021

Linking score scales across different tests is considered speculative and fraught, even at the aggregate level. We introduce and illustrate validation methods for aggregate linkages, using the challenge of linking U.S. school district average test scores across states as a motivating example. We show that aggregate linkages can be validated both…

Descriptors: Equated Scores, Validity, Methods, School Districts

Modeling Answer Changes on Test Items

Peer reviewed

Direct link

van der Linden, Wim J.; Jeon, Minjeong – Journal of Educational and Behavioral Statistics, 2012

The probability of test takers changing answers upon review of their initial choices is modeled. The primary purpose of the model is to check erasures on answer sheets recorded by an optical scanner for numbers and patterns that may be indicative of irregular behavior, such as teachers or school administrators changing answer sheets after their…

Descriptors: Probability, Models, Test Items, Educational Testing

Measuring Student Ability, Classifying Schools, and Detecting Item Bias at School Level, Based on Student-Level Dichotomous Items

Peer reviewed

Direct link

Bennink, Margot; Croon, Marcel A.; Keuning, Jos; Vermunt, Jeroen K. – Journal of Educational and Behavioral Statistics, 2014

In educational measurement, responses of students on items are used not only to measure the ability of students, but also to evaluate and compare the performance of schools. Analysis should ideally account for the multilevel structure of the data, and school-level processes not related to ability, such as working climate and administration…

Descriptors: Academic Ability, Educational Assessment, Educational Testing, Test Bias

Measuring Test Measurement Error: A General Approach

Peer reviewed

Direct link

Boyd, Donald; Lankford, Hamilton; Loeb, Susanna; Wyckoff, James – Journal of Educational and Behavioral Statistics, 2013

Test-based accountability as well as value-added asessments and much experimental and quasi-experimental research in education rely on achievement tests to measure student skills and knowledge. Yet, we know little regarding fundamental properties of these tests, an important example being the extent of measurement error and its implications for…

Descriptors: Accountability, Educational Research, Educational Testing, Error of Measurement

When Can Subscores Have Value?

Peer reviewed

Direct link

Haberman, Shelby J. – Journal of Educational and Behavioral Statistics, 2008

In educational tests, subscores are often generated from a portion of the items in a larger test. Guidelines based on mean squared error are proposed to indicate whether subscores are worth reporting. Alternatives considered are direct reports of subscores, estimates of subscores based on total score, combined estimates based on subscores and…

Descriptors: Testing Programs, Regression (Statistics), Scores, Student Evaluation

Double P-P Plots for Comparing Differences between Two Groups

Peer reviewed

Direct link

Livingston, Samuel A. – Journal of Educational and Behavioral Statistics, 2006

This article suggests a graphic technique that uses P-P plots to show the extent to which two groups differ on two variables. It can be used even if the variables are measured in completely different, noncomparable units. The comparison is symmetric with respect to the variables and the groups. It reflects the differences between the groups over…

Descriptors: Comparative Analysis, Groups, Differences, Graphs

Standard Error Estimation of 3PL IRT True Score Equating with an MCMC Method

Peer reviewed

Direct link

Liu, Yuming; Schulz, E. Matthew; Yu, Lei – Journal of Educational and Behavioral Statistics, 2008

A Markov chain Monte Carlo (MCMC) method and a bootstrap method were compared in the estimation of standard errors of item response theory (IRT) true score equating. Three test form relationships were examined: parallel, tau-equivalent, and congeneric. Data were simulated based on Reading Comprehension and Vocabulary tests of the Iowa Tests of…

Descriptors: Reading Comprehension, Test Format, Markov Processes, Educational Testing

Calibrating Item Families and Summarizing the Results Using Family Expected Response Functions

Peer reviewed

Direct link

Sinharay, Sandip; Johnson, Matthew S.; Williamson, David M. – Journal of Educational and Behavioral Statistics, 2003

Item families, which are groups of related items, are becoming increasingly popular in complex educational assessments. For example, in automatic item generation (AIG) systems, a test may consist of multiple items generated from each of a number of item models. Item calibration or scoring for such an assessment requires fitting models that can…

Descriptors: Test Items, Markov Processes, Educational Testing, Probability

Note on Unconditional and Conditional Hypothesis Testing: A Discussion of an Issue Raised by van der Linden and Sotaridona

Peer reviewed

Direct link

Lewis, Charles – Journal of Educational and Behavioral Statistics, 2006

In the context of reviewing an article for this journal (van der Linden & Sotaridona, this issue, pp. 283-304) the topic of unconditional and conditional hypothesis testing came under consideration. While this is hardly a new issue (consider, for example, arguments regarding the chi square vs. Fisher exact test of independence for a 2 x 2…

Descriptors: Hypothesis Testing, Educational Testing, Item Response Theory, Research Problems

Reliability of Essay Rating and Score Adjustment.

Peer reviewed

Longford, N. T. – Journal of Educational and Behavioral Statistics, 1994

Presents a model-based approach to rater reliability for essays read by multiple raters. The approach is motivated by generalizability theory, and variation of rater severity and rater inconsistency is considered in the presence of between-examinee variations. Illustrates methods with data from standardized educational tests. (Author/SLD)

Descriptors: Educational Testing, Essay Tests, Generalizability Theory, Interrater Reliability

Let's See More Empirical Studies on Value-Added Modeling of Teacher Effects: A Reply to Raudenbush, Rubin, Stuart and Zanutto, and Reckase

Peer reviewed

Direct link

McCaffrey, Daniel F.; Lockwood, J. R.; Koretz, Daniel; Louis, Thomas A.; Hamilton, Laura – Journal of Educational and Behavioral Statistics, 2004

The insightful discussions by Raudenbush, Rubin, Stuart and Zanutto (RSZ) and Reckase identify important challenges for interpreting the output of VAM and for its use with test-based accountability. As these authors note, VAM are statistical models for the correlations among scores from students who share common teachers or schools during the…

Descriptors: Educational Testing, Accountability, Mathematical Models, Teacher Influence

Lyle V. Jones: Biography

Peer reviewed

Direct link

Journal of Educational and Behavioral Statistics, 2003

Lyle V. Jones served as director of the Thurstone Psychometric Laboratory and also became the Vice Chancellor and Dean of the Graduate School of the University of North Carolina (UNC). Jones has been a Research Professor at UNC since 1992. This article presents an interview with Jones wherein he talked about his career as a researcher. Jones also…

Descriptors: National Competency Tests, Laboratories, Psychometrics, Profiles

Sinharay, Sandip	3
Bennink, Margot	1
Boyd, Donald	1
Croon, Marcel A.	1
Haberman, Shelby J.	1
Hamilton, Laura	1
Ho, Andrew D.	1
Jeon, Minjeong	1
Johnson, Matthew S.	1
Kalogrides, Demetra	1
Keuning, Jos	1
Koretz, Daniel	1
Lankford, Hamilton	1
Lewis, Charles	1
Liu, Yang	1
Liu, Yuming	1
Livingston, Samuel A.	1
Lockwood, J. R.	1
Loeb, Susanna	1
Longford, N. T.	1
Louis, Thomas A.	1
McCaffrey, Daniel F.	1
Reardon, Sean F.	1
Schulz, E. Matthew	1
Vermunt, Jeroen K.	1
More ▼