Publication Date
In 2025 | 0 |
Since 2024 | 0 |
Since 2021 (last 5 years) | 0 |
Since 2016 (last 10 years) | 3 |
Since 2006 (last 20 years) | 19 |
Descriptor
Comparative Analysis | 23 |
Models | 20 |
Item Response Theory | 12 |
Test Items | 9 |
Scores | 8 |
Correlation | 6 |
Scoring | 6 |
Simulation | 6 |
Statistical Analysis | 5 |
Test Reliability | 5 |
Computation | 4 |
More ▼ |
Source
ETS Research Report Series | 23 |
Author
von Davier, Matthias | 5 |
Haberman, Shelby J. | 3 |
Xu, Xueli | 3 |
Lee, Yi-Hsuan | 2 |
Sinharay, Sandip | 2 |
Zhang, Mo | 2 |
von Davier, Alina A. | 2 |
Almond, Russell | 1 |
Attali, Yigal | 1 |
Braun, Henry | 1 |
Breyer, F. Jay | 1 |
More ▼ |
Publication Type
Journal Articles | 23 |
Reports - Research | 23 |
Numerical/Quantitative Data | 1 |
Tests/Questionnaires | 1 |
Education Level
Elementary Education | 3 |
Secondary Education | 3 |
Grade 4 | 2 |
Higher Education | 2 |
Intermediate Grades | 2 |
Junior High Schools | 2 |
Middle Schools | 2 |
Postsecondary Education | 2 |
Early Childhood Education | 1 |
Grade 12 | 1 |
Grade 8 | 1 |
More ▼ |
Audience
Location
Laws, Policies, & Programs
Assessments and Surveys
National Assessment of… | 4 |
Early Childhood Longitudinal… | 1 |
Graduate Record Examinations | 1 |
Major Field Achievement Test… | 1 |
Praxis Series | 1 |
Test of English as a Foreign… | 1 |
What Works Clearinghouse Rating
Fu, Jianbin; Feng, Yuling – ETS Research Report Series, 2018
In this study, we propose aggregating test scores with unidimensional within-test structure and multidimensional across-test structure based on a 2-level, 1-factor model. In particular, we compare 6 score aggregation methods: average of standardized test raw scores (M1), regression factor score estimate of the 1-factor model based on the…
Descriptors: Comparative Analysis, Scores, Correlation, Standardized Tests
Loukina, Anastassia; Zechner, Klaus; Yoon, Su-Youn; Zhang, Mo; Tao, Jidong; Wang, Xinhao; Lee, Chong Min; Mulholland, Matthew – ETS Research Report Series, 2017
This report presents an overview of the "SpeechRater"? automated scoring engine model building and evaluation process for several item types with a focus on a low-English-proficiency test-taker population. We discuss each stage of speech scoring, including automatic speech recognition, filtering models for nonscorable responses, and…
Descriptors: Automation, Scoring, Speech Tests, Test Items
Haberman, Shelby J.; Liu, Yang; Lee, Yi-Hsuan – ETS Research Report Series, 2019
Distractor analyses are routinely conducted in educational assessments with multiple-choice items. In this research report, we focus on three item response models for distractors: (a) the traditional nominal response (NR) model, (b) a combination of a two-parameter logistic model for item scores and a NR model for selections of incorrect…
Descriptors: Multiple Choice Tests, Scores, Test Reliability, High Stakes Tests
Cao, Yi; Lu, Ru; Tao, Wei – ETS Research Report Series, 2014
The local item independence assumption underlying traditional item response theory (IRT) models is often not met for tests composed of testlets. There are 3 major approaches to addressing this issue: (a) ignore the violation and use a dichotomous IRT model (e.g., the 2-parameter logistic [2PL] model), (b) combine the interdependent items to form a…
Descriptors: Item Response Theory, Equated Scores, Test Items, Simulation
Ramineni, Chaitanya; Trapani, Catherine S.; Williamson, David M. – ETS Research Report Series, 2015
Automated scoring models were trained and evaluated for the essay task in the "Praxis I"® writing test. Prompt-specific and generic "e-rater"® scoring models were built, and evaluation statistics, such as quadratic weighted kappa, Pearson correlation, and standardized differences in mean scores, were examined to evaluate the…
Descriptors: Writing Tests, Licensing Examinations (Professions), Teacher Competency Testing, Scoring
Ling, Guangming – ETS Research Report Series, 2012
To assess the value of individual students' subscores on the Major Field Test in Business (MFT Business), I examined the test's internal structure with factor analysis and structural equation model methods, and analyzed the subscore reliabilities using the augmented scores method. Analyses of the internal structure suggested that the MFT Business…
Descriptors: Factor Analysis, Construct Validity, Structural Equation Models, Correlation
Wang, Zhen; Yao, Lihua – ETS Research Report Series, 2013
The current study used simulated data to investigate the properties of a newly proposed method (Yao's rater model) for modeling rater severity and its distribution under different conditions. Our study examined the effects of rater severity, distributions of rater severity, the difference between item response theory (IRT) models with rater effect…
Descriptors: Test Format, Test Items, Responses, Computation
Zhang, Mo; Breyer, F. Jay; Lorenz, Florian – ETS Research Report Series, 2013
In this research, we investigated the suitability of implementing "e-rater"® automated essay scoring in a high-stakes large-scale English language testing program. We examined the effectiveness of generic scoring and 2 variants of prompt-based scoring approaches. Effectiveness was evaluated on a number of dimensions, including agreement…
Descriptors: Computer Assisted Testing, Computer Software, Scoring, Language Tests
Braun, Henry; Qu, Yanxuan – ETS Research Report Series, 2008
This paper reports on a study conducted to investigate the consistency of the results between 2 approaches to estimating school effectiveness through value-added modeling. Estimates of school effects from the layered model employing item response theory (IRT) scaled data are compared to estimates derived from a discrete growth model based on the…
Descriptors: Value Added Models, School Effectiveness, Robustness (Statistics), Computation
Haberman, Shelby J. – ETS Research Report Series, 2008
Continuous exponential families are applied to linking forms via a single-group design. In this application, a distribution from the continuous bivariate exponential family is used that has selected moments that match those of the bivariate distribution of scores on the forms to be linked. The selected continuous bivariate distribution then yields…
Descriptors: Equated Scores, Probability, Statistical Distributions, Models
Haberman, Shelby J.; von Davier, Matthias; Lee, Yi-Hsuan – ETS Research Report Series, 2008
Multidimensional item response models can be based on multivariate normal ability distributions or on multivariate polytomous ability distributions. For the case of simple structure in which each item corresponds to a unique dimension of the ability vector, some applications of the two-parameter logistic model to empirical data are employed to…
Descriptors: Item Response Theory, Comparative Analysis, Ability, Models
DeCarlo, Lawrence T. – ETS Research Report Series, 2008
Rater behavior in essay grading can be viewed as a signal-detection task, in that raters attempt to discriminate between latent classes of essays, with the latent classes being defined by a scoring rubric. The present report examines basic aspects of an approach to constructed-response (CR) scoring via a latent-class signal-detection model. The…
Descriptors: Scoring, Responses, Test Format, Bias
Xu, Xueli; von Davier, Matthias – ETS Research Report Series, 2008
The general diagnostic model (GDM) utilizes located latent classes for modeling a multidimensional proficiency variable. In this paper, the GDM is extended by employing a log-linear model for multiple populations that assumes constraints on parameters across multiple groups. This constrained model is compared to log-linear models that assume…
Descriptors: Comparative Analysis, Models, Computation, National Competency Tests
von Davier, Alina A.; Carstensen, Claus H.; von Davier, Matthias – ETS Research Report Series, 2006
Measuring and linking competencies require special instruments, special data collection designs, and special statistical models. The measurement instruments are tests or tests forms, which can be used in the following situations: The same test can be given repeatedly; two or more parallel tests forms (i.e., forms intended to be similar in…
Descriptors: Scores, Measurement Techniques, Competence, Comparative Analysis
Xu, Xueli; von Davier, Matthias – ETS Research Report Series, 2008
Three strategies for linking two consecutive assessments are investigated and compared by analyzing reading data for the National Assessment of Educational Progress (NAEP) using the general diagnostic model. These strategies are compared in terms of marginal and joint expectations of skills, joint probabilities of skill patterns, and item…
Descriptors: National Competency Tests, Probability, Reading Achievement, Test Items
Previous Page | Next Page »
Pages: 1 | 2