ERIC - Search Results

Publication Date

In 2025	0
Since 2024	0
Since 2021 (last 5 years)	1
Since 2016 (last 10 years)	2
Since 2006 (last 20 years)	9

Descriptor

Comparative Analysis	11
Difficulty Level	11
Test Items	8
Scores	6
English (Second Language)	4
Second Language Learning	4
Test Format	4
College Entrance Examinations	3
Equated Scores	3
Item Response Theory	3
Language Tests	3
Psychometrics	3
Raw Scores	3
Statistical Analysis	3
Accuracy	2
Bayesian Statistics	2
Computer Assisted Testing	2
Graduate Study	2
Item Analysis	2
Reading Comprehension	2
Reading Tests	2
Scoring	2
Simulation	2
Test Wiseness	2
Testing	2
More ▼

Source

ETS Research Report Series

Publication Type

Journal Articles	11
Reports - Research	10
Reports - Evaluative	1

Education Level

Higher Education	3
Postsecondary Education	3
Junior High Schools	1
Middle Schools	1
Secondary Education	1

Audience

Location

France	1
Greece	1
South Korea	1
Vietnam	1

Laws, Policies, & Programs

Assessments and Surveys

Graduate Record Examinations	2
Test of English as a Foreign…	2
SAT (College Admission Test)	1

What Works Clearinghouse Rating

Showing all 11 results Save | Export

Assessing Mode Effects of At-Home Testing without a Randomized Trial. Research Report. ETS RR-21-10

Peer reviewed
PDF on ERIC

Download full text

Kim, Sooyeon; Walker, Michael – ETS Research Report Series, 2021

In this investigation, we used real data to assess potential differential effects associated with taking a test in a test center (TC) versus testing at home using remote proctoring (RP). We used a pseudo-equivalent groups (PEG) approach to examine group equivalence at the item level and the total score level. If our assumption holds that the PEG…

Descriptors: Testing, Distance Education, Comparative Analysis, Test Items

The Pseudo-Equivalent Groups Approach as an Alternative to Common-Item Equating. Research Report. ETS RR-18-02

Peer reviewed
PDF on ERIC

Download full text

Kim, Sooyeon; Lu, Ru – ETS Research Report Series, 2018

The purpose of this study was to evaluate the effectiveness of linking test scores by using test takers' background data to form pseudo-equivalent groups (PEG) of test takers. Using 4 operational test forms that each included 100 items and were taken by more than 30,000 test takers, we created 2 half-length research forms that had either 20…

Descriptors: Test Items, Item Banks, Difficulty Level, Comparative Analysis

Estimating Item Difficulty with Comparative Judgments. Research Report. ETS RR-14-39

Peer reviewed
PDF on ERIC

Download full text

Attali, Yigal; Saldivia, Luis; Jackson, Carol; Schuppan, Fred; Wanamaker, Wilbur – ETS Research Report Series, 2014

Previous investigations of the ability of content experts and test developers to estimate item difficulty have, for themost part, produced disappointing results. These investigations were based on a noncomparative method of independently rating the difficulty of items. In this article, we argue that, by eliciting comparative judgments of…

Descriptors: Test Items, Difficulty Level, Comparative Analysis, College Entrance Examinations

Effectiveness of Item Response Theory (IRT) Proficiency Estimation Methods under Adaptive Multistage Testing. Research Report. ETS RR-15-11

Peer reviewed
PDF on ERIC

Download full text

Kim, Sooyeon; Moses, Tim; Yoo, Hanwook Henry – ETS Research Report Series, 2015

The purpose of this inquiry was to investigate the effectiveness of item response theory (IRT) proficiency estimators in terms of estimation bias and error under multistage testing (MST). We chose a 2-stage MST design in which 1 adaptation to the examinees' ability levels takes place. It includes 4 modules (1 at Stage 1, 3 at Stage 2) and 3 paths…

Descriptors: Item Response Theory, Computation, Statistical Bias, Error of Measurement

An Investigation of the Impact of Misrouting under Two-Stage Multistage Testing: A Simulation Study. Research Report. ETS RR-14-01

Peer reviewed
PDF on ERIC

Download full text

Kim, Sooyeon; Moses, Tim – ETS Research Report Series, 2014

The purpose of this study was to investigate the potential impact of misrouting under a 2-stage multistage test (MST) design, which includes 1 routing and 3 second-stage modules. Simulations were used to create a situation in which a large group of examinees took each of the 3 possible MST paths (high, middle, and low). We compared differences in…

Descriptors: Comparative Analysis, Difficulty Level, Scores, Test Wiseness

Analyzing and Comparing Reading Stimulus Materials across the "TOEFL"® Family of Assessments. "TOEFL iBT"® Research Report. TOEFL iBT-26. ETS Research Report No. RR-15-08

Peer reviewed
PDF on ERIC

Download full text

Chen, Jing; Sheehan, Kathleen M. – ETS Research Report Series, 2015

The "TOEFL"® family of assessments includes the "TOEFL"® Primary"™, "TOEFL Junior"®, and "TOEFL iBT"® tests. The linguistic complexity of stimulus passages in the reading sections of the TOEFL family of assessments is expected to differ across the test levels. This study evaluates the linguistic…

Descriptors: Language Tests, Second Language Learning, English (Second Language), Reading Comprehension

Assessing the Test Information Function and Differential Item Functioning for the "TOEFL Junior"® Standard Test. Research Report. ETS RR-13-17. "TOEFL Junior"® Research Report. TOEFL JR-01

Peer reviewed
PDF on ERIC

Download full text

Young, John W.; Morgan, Rick; Rybinski, Paul; Steinberg, Jonathan; Wang, Yuan – ETS Research Report Series, 2013

The "TOEFL Junior"® Standard Test is an assessment that measures the degree to which middle school-aged students learning English as a second language have attained proficiency in the academic and social English skills representative of English-medium instructional environments. The assessment measures skills in three areas: listening…

Descriptors: Item Response Theory, Test Items, Language Tests, Second Language Learning

Comparisons among Designs for Equating Constructed-Response Tests. Research Report. ETS RR-08-53

Peer reviewed
PDF on ERIC

Download full text

Kim, Sooyeon; Walker, Michael E.; McHale, Frederick – ETS Research Report Series, 2008

This study examined variations of a nonequivalent groups equating design used with constructed-response (CR) tests to determine which design was most effective in producing equivalent scores across the two tests to be equated. Using data from a large-scale exam, the study investigated the use of anchor CR item rescoring in the context of classical…

Descriptors: Equated Scores, Comparative Analysis, Test Format, Responses

Examining an Alternative to Score Equating: A Randomly Equivalent Forms Approach. Research Report. ETS RR-08-14

Peer reviewed
PDF on ERIC

Download full text

Liao, Chi-Wen; Livingston, Samuel A. – ETS Research Report Series, 2008

Randomly equivalent forms (REF) of tests in listening and reading for nonnative speakers of English were created by stratified random assignment of items to forms, stratifying on item content and predicted difficulty. The study included 50 replications of the procedure for each test. Each replication generated 2 REFs. The equivalence of those 2…

Descriptors: Equated Scores, Item Analysis, Test Items, Difficulty Level

Analysis of Data from an Admissions Test with Item Models. Research Report. ETS RR-05-06

Peer reviewed
PDF on ERIC

Download full text

Sinharay, Sandip; Johnson, Matthew – ETS Research Report Series, 2005

"Item models" (LaDuca, Staples, Templeton, & Holzman, 1986) are classes from which it is possible to generate/produce items that are equivalent/isomorphic to other items from the same model (e.g., Bejar, 1996; Bejar, 2002). They have the potential to produce large number of high-quality items at reduced cost. This paper introduces…

Descriptors: Item Analysis, Test Items, Scoring, Psychometrics

Ensuring the Fairness of GRE Writing Prompts: Assessing Differential Difficulty. Research Report. ETS GRE Board Research Report No. 02-07R. ETS RR-05-11

Peer reviewed
PDF on ERIC

Download full text

Broer, Markus; Lee, Yong-Won; Rizavi, Saba; Powers, Don – ETS Research Report Series, 2005

Three polytomous DIF detection techniques--the Mantel test, logistic regression, and polySTAND--were used to identify GRE® Analytical Writing prompts ("Issue" and "Argument") that are differentially difficult for (a) female test takers; (b) African American, Asian, and Hispanic test takers; and (c) test takers whose strongest…

Descriptors: Culture Fair Tests, Item Response Theory, Test Items, Cues

Kim, Sooyeon	5
Moses, Tim	2
Attali, Yigal	1
Broer, Markus	1
Chen, Jing	1
Jackson, Carol	1
Johnson, Matthew	1
Lee, Yong-Won	1
Liao, Chi-Wen	1
Livingston, Samuel A.	1
Lu, Ru	1
McHale, Frederick	1
Morgan, Rick	1
Powers, Don	1
Rizavi, Saba	1
Rybinski, Paul	1
Saldivia, Luis	1
Schuppan, Fred	1
Sheehan, Kathleen M.	1
Sinharay, Sandip	1
Steinberg, Jonathan	1
Walker, Michael	1
Walker, Michael E.	1
Wanamaker, Wilbur	1
Wang, Yuan	1
More ▼