Publication Date
In 2025 | 0 |
Since 2024 | 2 |
Since 2021 (last 5 years) | 2 |
Since 2016 (last 10 years) | 3 |
Since 2006 (last 20 years) | 5 |
Descriptor
Error of Measurement | 7 |
Language Tests | 7 |
Monte Carlo Methods | 7 |
Test Items | 4 |
Item Response Theory | 3 |
Simulation | 3 |
Data Analysis | 2 |
Elementary School Students | 2 |
Evaluation Methods | 2 |
Grade 1 | 2 |
Grade 2 | 2 |
More ▼ |
Source
Annenberg Institute for… | 1 |
Applied Measurement in… | 1 |
ETS Research Report Series | 1 |
International Journal of… | 1 |
Journal of Educational and… | 1 |
Language Testing | 1 |
Author
James S. Kim | 2 |
Joshua B. Gilbert | 2 |
Luke W. Miratrix | 2 |
Carlson, James E. | 1 |
In'nami, Yo | 1 |
Koizumi, Rie | 1 |
Lee, Yong-Won | 1 |
Lin, Chih-Kai | 1 |
Liu, Yuming | 1 |
Rock, Donald A. | 1 |
Schulz, E. Matthew | 1 |
More ▼ |
Publication Type
Reports - Research | 7 |
Journal Articles | 5 |
Speeches/Meeting Papers | 1 |
Education Level
Early Childhood Education | 2 |
Elementary Education | 2 |
Grade 1 | 2 |
Grade 2 | 2 |
Grade 3 | 2 |
Primary Education | 2 |
Audience
Researchers | 1 |
Location
Laws, Policies, & Programs
Assessments and Surveys
ACT Assessment | 1 |
Iowa Tests of Basic Skills | 1 |
What Works Clearinghouse Rating
Joshua B. Gilbert; James S. Kim; Luke W. Miratrix – Annenberg Institute for School Reform at Brown University, 2024
Longitudinal models of individual growth typically emphasize between-person predictors of change but ignore how growth may vary "within" persons because each person contributes only one point at each time to the model. In contrast, modeling growth with multi-item assessments allows evaluation of how relative item performance may shift…
Descriptors: Vocabulary Development, Item Response Theory, Test Items, Student Development
Joshua B. Gilbert; James S. Kim; Luke W. Miratrix – Applied Measurement in Education, 2024
Longitudinal models typically emphasize between-person predictors of change but ignore how growth varies "within" persons because each person contributes only one data point at each time. In contrast, modeling growth with multi-item assessments allows evaluation of how relative item performance may shift over time. While traditionally…
Descriptors: Vocabulary Development, Item Response Theory, Test Items, Student Development
Lin, Chih-Kai – Language Testing, 2017
Sparse-rated data are common in operational performance-based language tests, as an inevitable result of assigning examinee responses to a fraction of available raters. The current study investigates the precision of two generalizability-theory methods (i.e., the rating method and the subdividing method) specifically designed to accommodate the…
Descriptors: Data Analysis, Language Tests, Generalizability Theory, Accuracy
In'nami, Yo; Koizumi, Rie – International Journal of Testing, 2013
The importance of sample size, although widely discussed in the literature on structural equation modeling (SEM), has not been widely recognized among applied SEM researchers. To narrow this gap, we focus on second language testing and learning studies and examine the following: (a) Is the sample size sufficient in terms of precision and power of…
Descriptors: Structural Equation Models, Sample Size, Second Language Instruction, Monte Carlo Methods
Liu, Yuming; Schulz, E. Matthew; Yu, Lei – Journal of Educational and Behavioral Statistics, 2008
A Markov chain Monte Carlo (MCMC) method and a bootstrap method were compared in the estimation of standard errors of item response theory (IRT) true score equating. Three test form relationships were examined: parallel, tau-equivalent, and congeneric. Data were simulated based on Reading Comprehension and Vocabulary tests of the Iowa Tests of…
Descriptors: Reading Comprehension, Test Format, Markov Processes, Educational Testing
Carlson, James E.; Spray, Judith A. – 1986
This paper discussed methods currently under study for use with multiple-response data. Besides using Bonferroni inequality methods to control type one error rate over a set of inferences involving multiple response data, a recently proposed methodology of plotting the p-values resulting from multiple significance tests was explored. Proficiency…
Descriptors: Cutting Scores, Data Analysis, Difficulty Level, Error of Measurement
Stricker, Lawrence J.; Rock, Donald A.; Lee, Yong-Won – ETS Research Report Series, 2005
This study assessed the factor structure of the LanguEdge™ test and the invariance of its factors across language groups. Confirmatory factor analyses of individual tasks and subsets of items in the four sections of the test, Listening, Reading, Speaking, and Writing, was carried out for Arabic-, Chinese-, and Spanish-speaking test takers. Two…
Descriptors: Factor Structure, Language Tests, Factor Analysis, Semitic Languages