Publication Date
In 2025 | 0 |
Since 2024 | 2 |
Since 2021 (last 5 years) | 3 |
Since 2016 (last 10 years) | 6 |
Since 2006 (last 20 years) | 7 |
Descriptor
Source
Annenberg Institute for… | 1 |
Applied Measurement in… | 1 |
ETS Research Report Series | 1 |
Education and Information… | 1 |
LEARN Journal: Language… | 1 |
Language Assessment Quarterly | 1 |
Language Teaching Research | 1 |
NWEA | 1 |
Author
James S. Kim | 2 |
Joshua B. Gilbert | 2 |
Luke W. Miratrix | 2 |
Beglar, David | 1 |
Carlson, James E. | 1 |
Gelbal, Selahattin | 1 |
Henning, Grant | 1 |
Holster, Trevor A. | 1 |
Kramer, Brandon | 1 |
Lake, J. | 1 |
Lee, Yong-Won | 1 |
More ▼ |
Publication Type
Reports - Research | 10 |
Journal Articles | 6 |
Speeches/Meeting Papers | 1 |
Education Level
Early Childhood Education | 3 |
Elementary Education | 3 |
Grade 2 | 3 |
Grade 3 | 3 |
Higher Education | 3 |
Postsecondary Education | 3 |
Primary Education | 3 |
Grade 1 | 2 |
Grade 4 | 1 |
Grade 5 | 1 |
Grade 6 | 1 |
More ▼ |
Audience
Researchers | 1 |
Laws, Policies, & Programs
Assessments and Surveys
Test of English as a Foreign… | 2 |
Test of English for… | 2 |
ACT Assessment | 1 |
Measures of Academic Progress | 1 |
What Works Clearinghouse Rating
Ozdemir, Burhanettin; Gelbal, Selahattin – Education and Information Technologies, 2022
The computerized adaptive tests (CAT) apply an adaptive process in which the items are tailored to individuals' ability scores. The multidimensional CAT (MCAT) designs differ in terms of different item selection, ability estimation, and termination methods being used. This study aims at investigating the performance of the MCAT designs used to…
Descriptors: Scores, Computer Assisted Testing, Test Items, Language Proficiency
Joshua B. Gilbert; James S. Kim; Luke W. Miratrix – Annenberg Institute for School Reform at Brown University, 2024
Longitudinal models of individual growth typically emphasize between-person predictors of change but ignore how growth may vary "within" persons because each person contributes only one point at each time to the model. In contrast, modeling growth with multi-item assessments allows evaluation of how relative item performance may shift…
Descriptors: Vocabulary Development, Item Response Theory, Test Items, Student Development
Joshua B. Gilbert; James S. Kim; Luke W. Miratrix – Applied Measurement in Education, 2024
Longitudinal models typically emphasize between-person predictors of change but ignore how growth varies "within" persons because each person contributes only one data point at each time. In contrast, modeling growth with multi-item assessments allows evaluation of how relative item performance may shift over time. While traditionally…
Descriptors: Vocabulary Development, Item Response Theory, Test Items, Student Development
Wudthayagorn, Jirada – LEARN Journal: Language Education and Acquisition Research Network, 2018
The purpose of this study was to map the Chulalongkorn University Test of English Proficiency, or the CU-TEP, to the Common European Framework of Reference (CEFR) by employing a standard setting methodology. Thirteen experts judged 120 items of the CU-TEP using the Yes/No Angoff technique. The experts decided whether or not a borderline student at…
Descriptors: Guidelines, Rating Scales, English (Second Language), Language Tests
Li, Sylvia; Meyer, Patrick – NWEA, 2019
This simulation study examines the measurement precision, item exposure rates, and the depth of the MAP® Growth™ item pools under various grade-level restrictions. Unlike most summative assessments, MAP Growth allows examinees to see items from any grade level, regardless of the examinee's actual grade level. It does not limit the test to items…
Descriptors: Achievement Tests, Item Banks, Test Items, Instructional Program Divisions
Holster, Trevor A.; Lake, J. – Language Assessment Quarterly, 2016
Stewart questioned Beglar's use of Rasch analysis of the Vocabulary Size Test (VST) and advocated the use of 3-parameter logistic item response theory (3PLIRT) on the basis that it models a non-zero lower asymptote for items, often called a "guessing" parameter. In support of this theory, Stewart presented fit statistics derived from…
Descriptors: Guessing (Tests), Item Response Theory, Vocabulary, Language Tests
McLean, Stuart; Kramer, Brandon; Beglar, David – Language Teaching Research, 2015
An important gap in the field of second language vocabulary assessment concerns the lack of validated tests measuring aural vocabulary knowledge. The primary purpose of this study is to introduce and provide preliminary validity evidence for the Listening Vocabulary Levels Test (LVLT), which has been designed as a diagnostic tool to measure…
Descriptors: Test Construction, Test Validity, English (Second Language), Second Language Learning
Test-Retest Analyses of the Test of English as a Foreign Language. TOEFL Research Reports Report 45.
Henning, Grant – 1993
This study provides information about the total and component scores of the Test of English as a Foreign Language (TOEFL). First, the study provides comparative global and component estimates of test-retest, alternate-form, and internal-consistency reliability, controlling for sources of measurement error inherent in the examinees and the testing…
Descriptors: Difficulty Level, English (Second Language), Error of Measurement, Estimation (Mathematics)
Carlson, James E.; Spray, Judith A. – 1986
This paper discussed methods currently under study for use with multiple-response data. Besides using Bonferroni inequality methods to control type one error rate over a set of inferences involving multiple response data, a recently proposed methodology of plotting the p-values resulting from multiple significance tests was explored. Proficiency…
Descriptors: Cutting Scores, Data Analysis, Difficulty Level, Error of Measurement
Stricker, Lawrence J.; Rock, Donald A.; Lee, Yong-Won – ETS Research Report Series, 2005
This study assessed the factor structure of the LanguEdge™ test and the invariance of its factors across language groups. Confirmatory factor analyses of individual tasks and subsets of items in the four sections of the test, Listening, Reading, Speaking, and Writing, was carried out for Arabic-, Chinese-, and Spanish-speaking test takers. Two…
Descriptors: Factor Structure, Language Tests, Factor Analysis, Semitic Languages