ERIC - Search Results

Publication Date

In 2025	0
Since 2024	0
Since 2021 (last 5 years)	0
Since 2016 (last 10 years)	3
Since 2006 (last 20 years)	19

Descriptor

Comparative Analysis	23
Models	20
Item Response Theory	12
Test Items	9
Scores	8
Correlation	6
Scoring	6
Simulation	6
Statistical Analysis	5
Test Reliability	5
Computation	4
Factor Analysis	4
Goodness of Fit	4
Item Analysis	4
National Competency Tests	4
Regression (Statistics)	4
Second Language Learning	4
Bayesian Statistics	3
Computer Assisted Testing	3
English (Second Language)	3
Equated Scores	3
Mathematics Tests	3
Multiple Choice Tests	3
Psychometrics	3
Reading Tests	3
More ▼

Source

ETS Research Report Series

Publication Type

Journal Articles	23
Reports - Research	23
Numerical/Quantitative Data	1
Tests/Questionnaires	1

Education Level

Elementary Education	3
Secondary Education	3
Grade 4	2
Higher Education	2
Intermediate Grades	2
Junior High Schools	2
Middle Schools	2
Postsecondary Education	2
Early Childhood Education	1
Grade 12	1
Grade 8	1
High Schools	1
Primary Education	1
More ▼

Audience

Location

Laws, Policies, & Programs

Assessments and Surveys

National Assessment of…	4
Early Childhood Longitudinal…	1
Graduate Record Examinations	1
Major Field Achievement Test…	1
Praxis Series	1
Test of English as a Foreign…	1

What Works Clearinghouse Rating

Showing 1 to 15 of 23 results Save | Export

A Comparison of Score Aggregation Methods for Unidimensional Tests on Different Dimensions. Research Report. ETS RR-18-01

Peer reviewed
PDF on ERIC

Download full text

Fu, Jianbin; Feng, Yuling – ETS Research Report Series, 2018

In this study, we propose aggregating test scores with unidimensional within-test structure and multidimensional across-test structure based on a 2-level, 1-factor model. In particular, we compare 6 score aggregation methods: average of standardized test raw scores (M1), regression factor score estimate of the 1-factor model based on the…

Descriptors: Comparative Analysis, Scores, Correlation, Standardized Tests

Performance of Automated Speech Scoring on Different Low- to Medium-Entropy Item Types for Low-Proficiency English Learners. Research Report. ETS RR-17-12

Peer reviewed
PDF on ERIC

Download full text

Loukina, Anastassia; Zechner, Klaus; Yoon, Su-Youn; Zhang, Mo; Tao, Jidong; Wang, Xinhao; Lee, Chong Min; Mulholland, Matthew – ETS Research Report Series, 2017

This report presents an overview of the "SpeechRater"? automated scoring engine model building and evaluation process for several item types with a focus on a low-English-proficiency test-taker population. We discuss each stage of speech scoring, including automatic speech recognition, filtering models for nonscorable responses, and…

Descriptors: Automation, Scoring, Speech Tests, Test Items

Distractor Analysis for Multiple-Choice Tests: An Empirical Study with International Language Assessment Data. Research Report. ETS RR-19-39

Peer reviewed
PDF on ERIC

Download full text

Haberman, Shelby J.; Liu, Yang; Lee, Yi-Hsuan – ETS Research Report Series, 2019

Distractor analyses are routinely conducted in educational assessments with multiple-choice items. In this research report, we focus on three item response models for distractors: (a) the traditional nominal response (NR) model, (b) a combination of a two-parameter logistic model for item scores and a NR model for selections of incorrect…

Descriptors: Multiple Choice Tests, Scores, Test Reliability, High Stakes Tests

Effect of Item Response Theory (IRT) Model Selection on Testlet-Based Test Equating. Research Report. ETS RR-14-19

Peer reviewed
PDF on ERIC

Download full text

Cao, Yi; Lu, Ru; Tao, Wei – ETS Research Report Series, 2014

The local item independence assumption underlying traditional item response theory (IRT) models is often not met for tests composed of testlets. There are 3 major approaches to addressing this issue: (a) ignore the violation and use a dichotomous IRT model (e.g., the 2-parameter logistic [2PL] model), (b) combine the interdependent items to form a…

Descriptors: Item Response Theory, Equated Scores, Test Items, Simulation

Evaluation of "e-rater"® for the "Praxis I"®Writing Test. Research Report. ETS RR-15-03

Peer reviewed
PDF on ERIC

Download full text

Ramineni, Chaitanya; Trapani, Catherine S.; Williamson, David M. – ETS Research Report Series, 2015

Automated scoring models were trained and evaluated for the essay task in the "Praxis I"® writing test. Prompt-specific and generic "e-rater"® scoring models were built, and evaluation statistics, such as quadratic weighted kappa, Pearson correlation, and standardized differences in mean scores, were examined to evaluate the…

Descriptors: Writing Tests, Licensing Examinations (Professions), Teacher Competency Testing, Scoring

Why the Major Field Test in Business Does Not Report Subscores: Reliability and Construct Validity Evidence. Research Report. ETS RR-12-11

Peer reviewed
PDF on ERIC

Download full text

Ling, Guangming – ETS Research Report Series, 2012

To assess the value of individual students' subscores on the Major Field Test in Business (MFT Business), I examined the test's internal structure with factor analysis and structural equation model methods, and analyzed the subscore reliabilities using the augmented scores method. Analyses of the internal structure suggested that the MFT Business…

Descriptors: Factor Analysis, Construct Validity, Structural Equation Models, Correlation

The Effects of Rater Severity and Rater Distribution on Examinees' Ability Estimation for Constructed-Response Items. Research Report. ETS RR-13-23

Peer reviewed
PDF on ERIC

Download full text

Wang, Zhen; Yao, Lihua – ETS Research Report Series, 2013

The current study used simulated data to investigate the properties of a newly proposed method (Yao's rater model) for modeling rater severity and its distribution under different conditions. Our study examined the effects of rater severity, distributions of rater severity, the difference between item response theory (IRT) models with rater effect…

Descriptors: Test Format, Test Items, Responses, Computation

Investigating the Suitability of Implementing the "e-rater"® Scoring Engine in a Large-Scale English Language Testing Program. Research Report. ETS RR-13-36

Peer reviewed
PDF on ERIC

Download full text

Zhang, Mo; Breyer, F. Jay; Lorenz, Florian – ETS Research Report Series, 2013

In this research, we investigated the suitability of implementing "e-rater"® automated essay scoring in a high-stakes large-scale English language testing program. We examined the effectiveness of generic scoring and 2 variants of prompt-based scoring approaches. Effectiveness was evaluated on a number of dimensions, including agreement…

Descriptors: Computer Assisted Testing, Computer Software, Scoring, Language Tests

Robustness of Value-Added Analysis of School Effectiveness. Research Report. ETS RR-08-22

Peer reviewed
PDF on ERIC

Download full text

Braun, Henry; Qu, Yanxuan – ETS Research Report Series, 2008

This paper reports on a study conducted to investigate the consistency of the results between 2 approaches to estimating school effectiveness through value-added modeling. Estimates of school effects from the layered model employing item response theory (IRT) scaled data are compared to estimates derived from a discrete growth model based on the…

Descriptors: Value Added Models, School Effectiveness, Robustness (Statistics), Computation

Linking with Continuous Exponential Families: Single-Group Designs. Research Report. ETS RR-08-61

Peer reviewed
PDF on ERIC

Download full text

Haberman, Shelby J. – ETS Research Report Series, 2008

Continuous exponential families are applied to linking forms via a single-group design. In this application, a distribution from the continuous bivariate exponential family is used that has selected moments that match those of the bivariate distribution of scores on the forms to be linked. The selected continuous bivariate distribution then yields…

Descriptors: Equated Scores, Probability, Statistical Distributions, Models

Comparison of Multidimensional Item Response Models: Multivariate Normal Ability Distributions versus Multivariate Polytomous Ability Distributions. Research Report. ETS RR-08-45

Peer reviewed
PDF on ERIC

Download full text

Haberman, Shelby J.; von Davier, Matthias; Lee, Yi-Hsuan – ETS Research Report Series, 2008

Multidimensional item response models can be based on multivariate normal ability distributions or on multivariate polytomous ability distributions. For the case of simple structure in which each item corresponds to a unique dimension of the ability vector, some applications of the two-parameter logistic model to empirical data are employed to…

Descriptors: Item Response Theory, Comparative Analysis, Ability, Models

Studies of a Latent-Class Signal-Detection Model for Constructed-Response Scoring. Research Report. ETS RR-08-63

Peer reviewed
PDF on ERIC

Download full text

DeCarlo, Lawrence T. – ETS Research Report Series, 2008

Rater behavior in essay grading can be viewed as a signal-detection task, in that raters attempt to discriminate between latent classes of essays, with the latent classes being defined by a scoring rubric. The present report examines basic aspects of an approach to constructed-response (CR) scoring via a latent-class signal-detection model. The…

Descriptors: Scoring, Responses, Test Format, Bias

Comparing Multiple-Group Multinomial Log-Linear Models for Multidimensional Skill Distributions in the General Diagnostic Model. Research Report. ETS RR-08-35

Peer reviewed
PDF on ERIC

Download full text

Xu, Xueli; von Davier, Matthias – ETS Research Report Series, 2008

The general diagnostic model (GDM) utilizes located latent classes for modeling a multidimensional proficiency variable. In this paper, the GDM is extended by employing a log-linear model for multiple populations that assumes constraints on parameters across multiple groups. This constrained model is compared to log-linear models that assume…

Descriptors: Comparative Analysis, Models, Computation, National Competency Tests

Linking Competencies in Educational Settings and Measuring Growth. Research Report. ETS RR-06-12

Peer reviewed
PDF on ERIC

Download full text

von Davier, Alina A.; Carstensen, Claus H.; von Davier, Matthias – ETS Research Report Series, 2006

Measuring and linking competencies require special instruments, special data collection designs, and special statistical models. The measurement instruments are tests or tests forms, which can be used in the following situations: The same test can be given repeatedly; two or more parallel tests forms (i.e., forms intended to be similar in…

Descriptors: Scores, Measurement Techniques, Competence, Comparative Analysis

Linking for the General Diagnostic Model. Research Report. ETS RR-08-08

Peer reviewed
PDF on ERIC

Download full text

Xu, Xueli; von Davier, Matthias – ETS Research Report Series, 2008

Three strategies for linking two consecutive assessments are investigated and compared by analyzing reading data for the National Assessment of Educational Progress (NAEP) using the general diagnostic model. These strategies are compared in terms of marginal and joint expectations of skills, joint probabilities of skill patterns, and item…

Descriptors: National Competency Tests, Probability, Reading Achievement, Test Items

Previous Page | Next Page »

Pages: 1 | 2

von Davier, Matthias	5
Haberman, Shelby J.	3
Xu, Xueli	3
Lee, Yi-Hsuan	2
Sinharay, Sandip	2
Zhang, Mo	2
von Davier, Alina A.	2
Almond, Russell	1
Attali, Yigal	1
Braun, Henry	1
Breyer, F. Jay	1
Cao, Yi	1
Carstensen, Claus H.	1
Casabianca, Jodi	1
DeCarlo, Lawrence T.	1
Feng, Yuling	1
Fu, Jianbin	1
Johnson, Matthew	1
Lee, Chong Min	1
Li, Deping	1
Ling, Guangming	1
Liu, Yang	1
Lorenz, Florian	1
Loukina, Anastassia	1
Lu, Ru	1
More ▼