NotesFAQContact Us
Collection
Advanced
Search Tips
Showing all 11 results Save | Export
Peer reviewed Peer reviewed
PDF on ERIC Download full text
Kim, Sooyeon; Walker, Michael – ETS Research Report Series, 2021
In this investigation, we used real data to assess potential differential effects associated with taking a test in a test center (TC) versus testing at home using remote proctoring (RP). We used a pseudo-equivalent groups (PEG) approach to examine group equivalence at the item level and the total score level. If our assumption holds that the PEG…
Descriptors: Testing, Distance Education, Comparative Analysis, Test Items
Peer reviewed Peer reviewed
PDF on ERIC Download full text
Kim, Sooyeon; Lu, Ru – ETS Research Report Series, 2018
The purpose of this study was to evaluate the effectiveness of linking test scores by using test takers' background data to form pseudo-equivalent groups (PEG) of test takers. Using 4 operational test forms that each included 100 items and were taken by more than 30,000 test takers, we created 2 half-length research forms that had either 20…
Descriptors: Test Items, Item Banks, Difficulty Level, Comparative Analysis
Peer reviewed Peer reviewed
PDF on ERIC Download full text
Attali, Yigal; Saldivia, Luis; Jackson, Carol; Schuppan, Fred; Wanamaker, Wilbur – ETS Research Report Series, 2014
Previous investigations of the ability of content experts and test developers to estimate item difficulty have, for themost part, produced disappointing results. These investigations were based on a noncomparative method of independently rating the difficulty of items. In this article, we argue that, by eliciting comparative judgments of…
Descriptors: Test Items, Difficulty Level, Comparative Analysis, College Entrance Examinations
Peer reviewed Peer reviewed
PDF on ERIC Download full text
Kim, Sooyeon; Moses, Tim; Yoo, Hanwook Henry – ETS Research Report Series, 2015
The purpose of this inquiry was to investigate the effectiveness of item response theory (IRT) proficiency estimators in terms of estimation bias and error under multistage testing (MST). We chose a 2-stage MST design in which 1 adaptation to the examinees' ability levels takes place. It includes 4 modules (1 at Stage 1, 3 at Stage 2) and 3 paths…
Descriptors: Item Response Theory, Computation, Statistical Bias, Error of Measurement
Peer reviewed Peer reviewed
PDF on ERIC Download full text
Kim, Sooyeon; Moses, Tim – ETS Research Report Series, 2014
The purpose of this study was to investigate the potential impact of misrouting under a 2-stage multistage test (MST) design, which includes 1 routing and 3 second-stage modules. Simulations were used to create a situation in which a large group of examinees took each of the 3 possible MST paths (high, middle, and low). We compared differences in…
Descriptors: Comparative Analysis, Difficulty Level, Scores, Test Wiseness
Peer reviewed Peer reviewed
PDF on ERIC Download full text
Chen, Jing; Sheehan, Kathleen M. – ETS Research Report Series, 2015
The "TOEFL"® family of assessments includes the "TOEFL"® Primary"™, "TOEFL Junior"®, and "TOEFL iBT"® tests. The linguistic complexity of stimulus passages in the reading sections of the TOEFL family of assessments is expected to differ across the test levels. This study evaluates the linguistic…
Descriptors: Language Tests, Second Language Learning, English (Second Language), Reading Comprehension
Peer reviewed Peer reviewed
PDF on ERIC Download full text
Young, John W.; Morgan, Rick; Rybinski, Paul; Steinberg, Jonathan; Wang, Yuan – ETS Research Report Series, 2013
The "TOEFL Junior"® Standard Test is an assessment that measures the degree to which middle school-aged students learning English as a second language have attained proficiency in the academic and social English skills representative of English-medium instructional environments. The assessment measures skills in three areas: listening…
Descriptors: Item Response Theory, Test Items, Language Tests, Second Language Learning
Peer reviewed Peer reviewed
PDF on ERIC Download full text
Kim, Sooyeon; Walker, Michael E.; McHale, Frederick – ETS Research Report Series, 2008
This study examined variations of a nonequivalent groups equating design used with constructed-response (CR) tests to determine which design was most effective in producing equivalent scores across the two tests to be equated. Using data from a large-scale exam, the study investigated the use of anchor CR item rescoring in the context of classical…
Descriptors: Equated Scores, Comparative Analysis, Test Format, Responses
Peer reviewed Peer reviewed
PDF on ERIC Download full text
Liao, Chi-Wen; Livingston, Samuel A. – ETS Research Report Series, 2008
Randomly equivalent forms (REF) of tests in listening and reading for nonnative speakers of English were created by stratified random assignment of items to forms, stratifying on item content and predicted difficulty. The study included 50 replications of the procedure for each test. Each replication generated 2 REFs. The equivalence of those 2…
Descriptors: Equated Scores, Item Analysis, Test Items, Difficulty Level
Peer reviewed Peer reviewed
PDF on ERIC Download full text
Sinharay, Sandip; Johnson, Matthew – ETS Research Report Series, 2005
"Item models" (LaDuca, Staples, Templeton, & Holzman, 1986) are classes from which it is possible to generate/produce items that are equivalent/isomorphic to other items from the same model (e.g., Bejar, 1996; Bejar, 2002). They have the potential to produce large number of high-quality items at reduced cost. This paper introduces…
Descriptors: Item Analysis, Test Items, Scoring, Psychometrics
Peer reviewed Peer reviewed
PDF on ERIC Download full text
Broer, Markus; Lee, Yong-Won; Rizavi, Saba; Powers, Don – ETS Research Report Series, 2005
Three polytomous DIF detection techniques--the Mantel test, logistic regression, and polySTAND--were used to identify GRE® Analytical Writing prompts ("Issue" and "Argument") that are differentially difficult for (a) female test takers; (b) African American, Asian, and Hispanic test takers; and (c) test takers whose strongest…
Descriptors: Culture Fair Tests, Item Response Theory, Test Items, Cues