NotesFAQContact Us
Collection
Advanced
Search Tips
Source
ETS Research Report Series63
Audience
Laws, Policies, & Programs
No Child Left Behind Act 20012
What Works Clearinghouse Rating
Showing 1 to 15 of 63 results Save | Export
Peer reviewed Peer reviewed
PDF on ERIC Download full text
Yanxuan Qu; Sandip Sinharay – ETS Research Report Series, 2023
Though a substantial amount of research exists on imputing missing scores in educational assessments, there is little research on cases where responses or scores to an item are missing for all test takers. In this paper, we tackled the problem of imputing missing scores for tests for which the responses to an item are missing for all test takers.…
Descriptors: Scores, Test Items, Accuracy, Psychometrics
Peer reviewed Peer reviewed
PDF on ERIC Download full text
Ling, Guangming; Williams, Jean; O'Brien, Sue; Cavalie, Carlos F. – ETS Research Report Series, 2022
Recognizing the appealing features of a tablet (e.g., an iPad), including size, mobility, touch screen display, and virtual keyboard, more educational professionals are moving away from larger laptop and desktop computers and turning to the iPad for their daily work, such as reading and writing. Following the results of a recent survey of…
Descriptors: Tablet Computers, Computers, Essays, Scoring
Peer reviewed Peer reviewed
PDF on ERIC Download full text
Attali, Yigal – ETS Research Report Series, 2020
Principles of skill acquisition dictate that raters should be provided with frequent feedback about their ratings. However, in current operational practice, raters rarely receive immediate feedback about their scores owing to the prohibitive effort required to generate such feedback. An approach for generating and administering feedback responses…
Descriptors: Feedback (Response), Evaluators, Accuracy, Scores
Peer reviewed Peer reviewed
PDF on ERIC Download full text
Wang, Wei; Dorans, Neil J. – ETS Research Report Series, 2021
Agreement statistics and measures of prediction accuracy are often used to assess the quality of two measures of a construct. Agreement statistics are appropriate for measures that are supposed to be interchangeable, whereas prediction accuracy statistics are appropriate for situations where one variable is the target and the other variables are…
Descriptors: Classification, Scaling, Prediction, Accuracy
Peer reviewed Peer reviewed
PDF on ERIC Download full text
Lu, Ru; Kim, Sooyeon – ETS Research Report Series, 2021
This study evaluated the impact of subgroup weighting for equating through a common-item anchor. We used data from a single test form to create two research forms for which the equating relationship was known. The results showed that equating was most accurate when the new form and reference form samples were weighted to be similar to the target…
Descriptors: Equated Scores, Weighted Scores, Raw Scores, Test Items
Peer reviewed Peer reviewed
PDF on ERIC Download full text
Jones, Nathan; Bell, Courtney; Qi, Yi; Lewis, Jennifer; Kirui, David; Stickler, Leslie; Redash, Amanda – ETS Research Report Series, 2021
The observation systems being used in all 50 states require administrators to learn to accurately and reliably score their teachers' instruction using standardized observation systems. Although the literature on observation systems is growing, relatively few studies have examined the outcomes of trainings focused on developing administrators'…
Descriptors: Observation, Standardized Tests, Teacher Evaluation, Test Reliability
Peer reviewed Peer reviewed
PDF on ERIC Download full text
Haberman, Shelby J. – ETS Research Report Series, 2019
Measures of agreement are compared to measures of prediction accuracy within a general context. Differences in appropriate use are emphasized, and approaches are examined for both numerical and nominal variables. General estimation methods are developed, and their large-sample properties are compared.
Descriptors: Measurement Techniques, Classification, Prediction, Accuracy
Peer reviewed Peer reviewed
PDF on ERIC Download full text
Fu, Jianbin – ETS Research Report Series, 2019
A maximum marginal likelihood estimation with an expectation-maximization algorithm has been developed for estimating multigroup or mixture multidimensional item response theory models using the generalized partial credit function, graded response function, and 3-parameter logistic function. The procedure includes the estimation of item…
Descriptors: Maximum Likelihood Statistics, Mathematics, Item Response Theory, Expectation
Peer reviewed Peer reviewed
PDF on ERIC Download full text
Lu, Ru; Guo, Hongwen – ETS Research Report Series, 2018
In this paper we compare the newly developed pseudo-equivalent groups (PEG) linking method with the linking methods based on the traditional nonequivalent groups with anchor test (NEAT) design and illustrate how to use the PEG methods under imperfect equating conditions. To do this, we proposed a new method that combines the features of PEG…
Descriptors: Equated Scores, Comparative Analysis, Test Items, Background
Peer reviewed Peer reviewed
PDF on ERIC Download full text
Dorans, Neil J. – ETS Research Report Series, 2018
A distinction is made between scores as measures of a construct and predictions of a criterion or outcome variable. The interpretation attached to predictions of criteria, such as job performance or college grade point average (GPA), differs from that attached to scores that are measures of a construct, such as reading proficiency or knowledge…
Descriptors: Job Performance, Scores, Data Interpretation, Statistical Distributions
Peer reviewed Peer reviewed
PDF on ERIC Download full text
Fu, Jianbin; Qu, Yanxuan – ETS Research Report Series, 2018
Various subscore estimation methods that use auxiliary information to improve subscore accuracy and stability have been developed. This report provides a review of various subscore estimation methods described in the literature. The methodology of each method is described, then research studies on these subscore estimation methods are summarized.…
Descriptors: Scores, Evaluation Methods, Item Response Theory, Test Items
Peer reviewed Peer reviewed
PDF on ERIC Download full text
Wendler, Cathy; Glazer, Nancy; Cline, Frederick – ETS Research Report Series, 2019
One of the challenges in scoring constructed-response (CR) items and tasks is ensuring that rater drift does not occur during or across scoring windows. Rater drift reflects changes in how raters interpret and use established scoring criteria to assign essay scores. Calibration is a process used to help control rater drift and, as such, serves as…
Descriptors: College Entrance Examinations, Graduate Study, Accuracy, Test Reliability
Peer reviewed Peer reviewed
PDF on ERIC Download full text
Kim, Sooyeon; Moses, Tim – ETS Research Report Series, 2018
The purpose of this study is to assess the impact of aberrant responses on the estimation accuracy in forced-choice format assessments. To that end, a wide range of aberrant response behaviors (e.g., fake, random, or mechanical responses) affecting upward of 20%--30% of the responses was manipulated under the multi-unidimensional pairwise…
Descriptors: Measurement Techniques, Response Style (Tests), Accuracy, Computation
Peer reviewed Peer reviewed
PDF on ERIC Download full text
Finn, Bridgid; Wendler, Cathy; Ricker-Pedley, Kathryn L.; Arslan, Burcu – ETS Research Report Series, 2018
This report investigates whether the time between scoring sessions has an influence on operational and nonoperational scoring accuracy. The study evaluates raters' scoring accuracy on constructed-response essay responses for the "GRE"® General Test. Binomial linear mixed-effect models are presented that evaluate how the effect of various…
Descriptors: Intervals, Scoring, Accuracy, Essay Tests
Peer reviewed Peer reviewed
PDF on ERIC Download full text
Choi, Ikkyu; Hao, Jiangang; Deane, Paul; Zhang, Mo – ETS Research Report Series, 2021
"Biometrics" are physical or behavioral human characteristics that can be used to identify a person. It is widely known that keystroke or typing dynamics for short, fixed texts (e.g., passwords) could serve as a behavioral biometric. In this study, we investigate whether keystroke data from essay responses can lead to a reliable…
Descriptors: Accuracy, High Stakes Tests, Writing Tests, Benchmarking
Previous Page | Next Page »
Pages: 1  |  2  |  3  |  4  |  5