ERIC - Search Results

Publication Date

In 2025	0
Since 2024	0
Since 2021 (last 5 years)	0
Since 2016 (last 10 years)	0
Since 2006 (last 20 years)	13

Descriptor

Comparative Analysis	14
Test Format	14
Equated Scores	7
Statistical Analysis	7
Test Items	7
Computer Assisted Testing	6
Item Response Theory	5
Scores	5
Simulation	5
Difficulty Level	4
Raw Scores	4
Responses	4
Accuracy	3
Error of Measurement	3
Models	3
Multiple Choice Tests	3
Test Reliability	3
College Entrance Examinations	2
Correlation	2
English (Second Language)	2
Gender Differences	2
Item Analysis	2
Language Tests	2
Reading Tests	2
Regression (Statistics)	2
More ▼

Source

ETS Research Report Series

Publication Type

Journal Articles	14
Reports - Research	14
Numerical/Quantitative Data	1
Speeches/Meeting Papers	1

Education Level

Higher Education	3
Postsecondary Education	3

Audience

Location

Laws, Policies, & Programs

Assessments and Surveys

Praxis Series	2
SAT (College Admission Test)	2
Test of English as a Foreign…	1

What Works Clearinghouse Rating

Showing all 14 results Save | Export

An Investigation of the Impact of Misrouting under Two-Stage Multistage Testing: A Simulation Study. Research Report. ETS RR-14-01

Peer reviewed
PDF on ERIC

Download full text

Kim, Sooyeon; Moses, Tim – ETS Research Report Series, 2014

The purpose of this study was to investigate the potential impact of misrouting under a 2-stage multistage test (MST) design, which includes 1 routing and 3 second-stage modules. Simulations were used to create a situation in which a large group of examinees took each of the 3 possible MST paths (high, middle, and low). We compared differences in…

Descriptors: Comparative Analysis, Difficulty Level, Scores, Test Wiseness

Analyzing and Comparing Reading Stimulus Materials across the "TOEFL"® Family of Assessments. "TOEFL iBT"® Research Report. TOEFL iBT-26. ETS Research Report No. RR-15-08

Peer reviewed
PDF on ERIC

Download full text

Chen, Jing; Sheehan, Kathleen M. – ETS Research Report Series, 2015

The "TOEFL"® family of assessments includes the "TOEFL"® Primary"™, "TOEFL Junior"®, and "TOEFL iBT"® tests. The linguistic complexity of stimulus passages in the reading sections of the TOEFL family of assessments is expected to differ across the test levels. This study evaluates the linguistic…

Descriptors: Language Tests, Second Language Learning, English (Second Language), Reading Comprehension

An Item-Driven Adaptive Design for Calibrating Pretest Items. Research Report. ETS RR-14-38

Peer reviewed
PDF on ERIC

Download full text

Ali, Usama S.; Chang, Hua-Hua – ETS Research Report Series, 2014

Adaptive testing is advantageous in that it provides more efficient ability estimates with fewer items than linear testing does. Item-driven adaptive pretesting may also offer similar advantages, and verification of such a hypothesis about item calibration was the main objective of this study. A suitability index (SI) was introduced to adaptively…

Descriptors: Adaptive Testing, Simulation, Pretests Posttests, Test Items

Exploring Alternative Test Form Linking Designs with Modified Equating Sample Size and Anchor Test Length. Research Report. ETS RR-13-02

Peer reviewed
PDF on ERIC

Download full text

Wang, Lin; Qian, Jiahe; Lee, Yi-Hsuan – ETS Research Report Series, 2013

The purpose of this study was to evaluate the combined effects of reduced equating sample size and shortened anchor test length on item response theory (IRT)-based linking and equating results. Data from two independent operational forms of a large-scale testing program were used to establish the baseline results for evaluating the results from…

Descriptors: Test Construction, Item Response Theory, Testing Programs, Simulation

The Effects of Rater Severity and Rater Distribution on Examinees' Ability Estimation for Constructed-Response Items. Research Report. ETS RR-13-23

Peer reviewed
PDF on ERIC

Download full text

Wang, Zhen; Yao, Lihua – ETS Research Report Series, 2013

The current study used simulated data to investigate the properties of a newly proposed method (Yao's rater model) for modeling rater severity and its distribution under different conditions. Our study examined the effects of rater severity, distributions of rater severity, the difference between item response theory (IRT) models with rater effect…

Descriptors: Test Format, Test Items, Responses, Computation

A Comparison of Achievement Gaps and Test-Taker Characteristics on Computer-Delivered and Paper-Delivered "Praxis I"® Tests. Research Report. ETS RR-14-35

Peer reviewed
PDF on ERIC

Download full text

Steinberg, Jonathan; Brenneman, Meghan; Castellano, Karen; Lin, Peng; Miller, Susanne – ETS Research Report Series, 2014

Test providers are increasingly moving toward exclusively administering assessments by computer. Computerized testing is becoming more desirable for test takers because of increased opportunities to test, faster turnaround of individual scores, or perhaps other factors, offering potential benefits for those who may be struggling to pass licensure…

Descriptors: Comparative Analysis, Achievement Gap, Academic Achievement, Test Format

Evaluating Subpopulation Invariance of Linking Functions to Determine the Anchor Composition for a Mixed-Format Test. Research Report. ETS RR-09-36

Peer reviewed
PDF on ERIC

Download full text

Kim, Sooyeon; Walker, Michael E. – ETS Research Report Series, 2009

We examined the appropriateness of the anchor composition in a mixed-format test, which includes both multiple-choice (MC) and constructed-response (CR) items, using subpopulation invariance indices. We derived linking functions in the nonequivalent groups with anchor test (NEAT) design using two types of anchor sets: (a) MC only and (b) a mix of…

Descriptors: Test Format, Equated Scores, Test Items, Multiple Choice Tests

Studies of a Latent-Class Signal-Detection Model for Constructed-Response Scoring. Research Report. ETS RR-08-63

Peer reviewed
PDF on ERIC

Download full text

DeCarlo, Lawrence T. – ETS Research Report Series, 2008

Rater behavior in essay grading can be viewed as a signal-detection task, in that raters attempt to discriminate between latent classes of essays, with the latent classes being defined by a scoring rubric. The present report examines basic aspects of an approach to constructed-response (CR) scoring via a latent-class signal-detection model. The…

Descriptors: Scoring, Responses, Test Format, Bias

Comparisons among Designs for Equating Constructed-Response Tests. Research Report. ETS RR-08-53

Peer reviewed
PDF on ERIC

Download full text

Kim, Sooyeon; Walker, Michael E.; McHale, Frederick – ETS Research Report Series, 2008

This study examined variations of a nonequivalent groups equating design used with constructed-response (CR) tests to determine which design was most effective in producing equivalent scores across the two tests to be equated. Using data from a large-scale exam, the study investigated the use of anchor CR item rescoring in the context of classical…

Descriptors: Equated Scores, Comparative Analysis, Test Format, Responses

Development of Approximations to Population Invariance Indices. Research Report. ETS RR-08-36

Peer reviewed
PDF on ERIC

Download full text

Liu, Jinghua; Zhu, Xiaowen – ETS Research Report Series, 2008

The purpose of this paper is to explore methods to approximate population invariance without conducting multiple linkings for subpopulations. Under the single group or equivalent groups design, no linking needs to be performed for the parallel-linear system linking functions. The unequated raw score information can be used as an approximation. For…

Descriptors: Raw Scores, Test Format, Comparative Analysis, Test Construction

Examining an Alternative to Score Equating: A Randomly Equivalent Forms Approach. Research Report. ETS RR-08-14

Peer reviewed
PDF on ERIC

Download full text

Liao, Chi-Wen; Livingston, Samuel A. – ETS Research Report Series, 2008

Randomly equivalent forms (REF) of tests in listening and reading for nonnative speakers of English were created by stratified random assignment of items to forms, stratifying on item content and predicted difficulty. The study included 50 replications of the procedure for each test. Each replication generated 2 REFs. The equivalence of those 2…

Descriptors: Equated Scores, Item Analysis, Test Items, Difficulty Level

An Exploration of Kernel Equating Using SAT® Data: Equating to a Similar Population and to a Distant Population. Research Report. ETS RR-07-17

Peer reviewed
PDF on ERIC

Download full text

Liu, Jinghua; Low, Albert C. – ETS Research Report Series, 2007

This study applied kernel equating (KE) in two scenarios: equating to a very similar population and equating to a very different population, referred to as a distant population, using SAT® data. The KE results were compared to the results obtained from analogous classical equating methods in both scenarios. The results indicate that KE results are…

Descriptors: College Entrance Examinations, Equated Scores, Comparative Analysis, Evaluation Methods

Comparison of Multistage Tests with Computerized Adaptive and Paper-and-Pencil Tests. Research Report. ETS RR-07-04

Peer reviewed
PDF on ERIC

Download full text

Rotou, Ourania; Patsula, Liane; Steffen, Manfred; Rizavi, Saba – ETS Research Report Series, 2007

Traditionally, the fixed-length linear paper-and-pencil (P&P) mode of administration has been the standard method of test delivery. With the advancement of technology, however, the popularity of administering tests using adaptive methods like computerized adaptive testing (CAT) and multistage testing (MST) has grown in the field of measurement…

Descriptors: Comparative Analysis, Test Format, Computer Assisted Testing, Models

Evaluating the Comparability of Paper-and-Pencil and Computerized Versions of a Large-Scale Certification Test. Research Report. ETS RR-05-21

Peer reviewed
PDF on ERIC

Download full text

Puhan, Gautam; Boughton, Keith A.; Kim, Sooyeon – ETS Research Report Series, 2005

The study evaluated the comparability of two versions of a teacher certification test: a paper-and-pencil test (PPT) and computer-based test (CBT). Standardized mean difference (SMD) and differential item functioning (DIF) analyses were used as measures of comparability at the test and item levels, respectively. Results indicated that effect sizes…

Descriptors: Comparative Analysis, Test Items, Statistical Analysis, Teacher Certification

Kim, Sooyeon	4
Liu, Jinghua	2
Walker, Michael E.	2
Ali, Usama S.	1
Boughton, Keith A.	1
Brenneman, Meghan	1
Castellano, Karen	1
Chang, Hua-Hua	1
Chen, Jing	1
DeCarlo, Lawrence T.	1
Lee, Yi-Hsuan	1
Liao, Chi-Wen	1
Lin, Peng	1
Livingston, Samuel A.	1
Low, Albert C.	1
McHale, Frederick	1
Miller, Susanne	1
Moses, Tim	1
Patsula, Liane	1
Puhan, Gautam	1
Qian, Jiahe	1
Rizavi, Saba	1
Rotou, Ourania	1
Sheehan, Kathleen M.	1
Steffen, Manfred	1
More ▼