ERIC - Search Results

Publication Date

In 2025	0
Since 2024	0
Since 2021 (last 5 years)	7
Since 2016 (last 10 years)	25
Since 2006 (last 20 years)	62

Descriptor

Accuracy	63
Equated Scores	20
Scoring	20
Statistical Analysis	19
Test Items	18
Item Response Theory	17
Scores	17
Computation	16
Comparative Analysis	13
Correlation	11
Error of Measurement	11
Language Tests	11
Computer Assisted Testing	10
Models	10
Sample Size	10
English (Second Language)	9
Prediction	9
College Entrance Examinations	8
Second Language Learning	8
Simulation	8
Test Reliability	7
Classification	6
Evaluation Methods	6
Graduate Study	6
Regression (Statistics)	6
More ▼

Source

ETS Research Report Series

Publication Type

Journal Articles	63
Reports - Research	59
Tests/Questionnaires	5
Numerical/Quantitative Data	4
Reports - Descriptive	3
Information Analyses	1
Reports - General	1

Education Level

Higher Education	11
Postsecondary Education	11
Secondary Education	8
Elementary Education	3
Elementary Secondary Education	3
High Schools	3
Junior High Schools	3
Middle Schools	3
Adult Education	1
Grade 8	1
High School Equivalency…	1
More ▼

Audience

Location

Japan	2
Australia	1
California (Los Angeles)	1
China	1
France	1
Germany	1
Massachusetts	1
Netherlands	1
Philippines	1
South Korea	1
United States	1
More ▼

Laws, Policies, & Programs

No Child Left Behind Act 2001

Assessments and Surveys

Graduate Record Examinations	7
Test of English as a Foreign…	5
Praxis Series	3
Program for International…	2
SAT (College Admission Test)	2
ACT Assessment	1
Massachusetts Comprehensive…	1
National Assessment of…	1
National Merit Scholarship…	1
Preliminary Scholastic…	1
Test of English for…	1
More ▼

What Works Clearinghouse Rating

Showing 1 to 15 of 63 results Save | Export

Methods for Imputing Scores When All Responses Are Missing for One or More Polytomous Items: Accuracy and Impact on Psychometric Property. Research Report. ETS RR-23-07

Peer reviewed
PDF on ERIC

Download full text

Yanxuan Qu; Sandip Sinharay – ETS Research Report Series, 2023

Though a substantial amount of research exists on imputing missing scores in educational assessments, there is little research on cases where responses or scores to an item are missing for all test takers. In this paper, we tackled the problem of imputing missing scores for tests for which the responses to an item are missing for all test takers.…

Descriptors: Scores, Test Items, Accuracy, Psychometrics

Scoring Essays on an iPad Versus a Desktop Computer: An Exploratory Study. Research Report. ETS RR-22-08

Peer reviewed
PDF on ERIC

Download full text

Ling, Guangming; Williams, Jean; O'Brien, Sue; Cavalie, Carlos F. – ETS Research Report Series, 2022

Recognizing the appealing features of a tablet (e.g., an iPad), including size, mobility, touch screen display, and virtual keyboard, more educational professionals are moving away from larger laptop and desktop computers and turning to the iPad for their daily work, such as reading and writing. Following the results of a recent survey of…

Descriptors: Tablet Computers, Computers, Essays, Scoring

Effect of Immediate Elaborated Feedback on Rater Accuracy. Research Report. ETS RR-20-09

Peer reviewed
PDF on ERIC

Download full text

Attali, Yigal – ETS Research Report Series, 2020

Principles of skill acquisition dictate that raters should be provided with frequent feedback about their ratings. However, in current operational practice, raters rarely receive immediate feedback about their scores owing to the prohibitive effort required to generate such feedback. An approach for generating and administering feedback responses…

Descriptors: Feedback (Response), Evaluators, Accuracy, Scores

Impact of Categorization and Scaling on Classification Agreement and Prediction Accuracy Statistics. Research Report. ETS RR-21-26

Peer reviewed
PDF on ERIC

Download full text

Wang, Wei; Dorans, Neil J. – ETS Research Report Series, 2021

Agreement statistics and measures of prediction accuracy are often used to assess the quality of two measures of a construct. Agreement statistics are appropriate for measures that are supposed to be interchangeable, whereas prediction accuracy statistics are appropriate for situations where one variable is the target and the other variables are…

Descriptors: Classification, Scaling, Prediction, Accuracy

Effect of Statistically Matching Equating Samples for Common-Item Equating. Research Report. ETS RR-21-02

Peer reviewed
PDF on ERIC

Download full text

Lu, Ru; Kim, Sooyeon – ETS Research Report Series, 2021

This study evaluated the impact of subgroup weighting for equating through a common-item anchor. We used data from a single test form to create two research forms for which the equating relationship was known. The results showed that equating was most accurate when the new form and reference form samples were weighted to be similar to the target…

Descriptors: Equated Scores, Weighted Scores, Raw Scores, Test Items

Certified to Evaluate: Exploring Administrator Accuracy and Beliefs in Teacher Observation. Research Report. ETS RR-21-05

Peer reviewed
PDF on ERIC

Download full text

Jones, Nathan; Bell, Courtney; Qi, Yi; Lewis, Jennifer; Kirui, David; Stickler, Leslie; Redash, Amanda – ETS Research Report Series, 2021

The observation systems being used in all 50 states require administrators to learn to accurately and reliably score their teachers' instruction using standardized observation systems. Although the literature on observation systems is growing, relatively few studies have examined the outcomes of trainings focused on developing administrators'…

Descriptors: Observation, Standardized Tests, Teacher Evaluation, Test Reliability

Measures of Agreement versus Measures of Prediction Accuracy. Research Report. ETS RR-19-20

Peer reviewed
PDF on ERIC

Download full text

Haberman, Shelby J. – ETS Research Report Series, 2019

Measures of agreement are compared to measures of prediction accuracy within a general context. Differences in appropriate use are emphasized, and approaches are examined for both numerical and nominal variables. General estimation methods are developed, and their large-sample properties are compared.

Descriptors: Measurement Techniques, Classification, Prediction, Accuracy

Maximum Marginal Likelihood Estimation with an Expectation-Maximization Algorithm for Multigroup/Mixture Multidimensional Item Response Theory Models. Research Report. ETS RR-19-35

Peer reviewed
PDF on ERIC

Download full text

Fu, Jianbin – ETS Research Report Series, 2019

A maximum marginal likelihood estimation with an expectation-maximization algorithm has been developed for estimating multigroup or mixture multidimensional item response theory models using the generalized partial credit function, graded response function, and 3-parameter logistic function. The procedure includes the estimation of item…

Descriptors: Maximum Likelihood Statistics, Mathematics, Item Response Theory, Expectation

A Simulation Study to Compare Nonequivalent Groups with Anchor Test Equating and Pseudo-Equivalent Group Linking. Research Report. ETS RR-18-08

Peer reviewed
PDF on ERIC

Download full text

Lu, Ru; Guo, Hongwen – ETS Research Report Series, 2018

In this paper we compare the newly developed pseudo-equivalent groups (PEG) linking method with the linking methods based on the traditional nonequivalent groups with anchor test (NEAT) design and illustrate how to use the PEG methods under imperfect equating conditions. To do this, we proposed a new method that combines the features of PEG…

Descriptors: Equated Scores, Comparative Analysis, Test Items, Background

Providing a Context for Interpreting Predictions of Job Performance. Research Report. ETS RR-18-38

Peer reviewed
PDF on ERIC

Download full text

Dorans, Neil J. – ETS Research Report Series, 2018

A distinction is made between scores as measures of a construct and predictions of a criterion or outcome variable. The interpretation attached to predictions of criteria, such as job performance or college grade point average (GPA), differs from that attached to scores that are measures of a construct, such as reading proficiency or knowledge…

Descriptors: Job Performance, Scores, Data Interpretation, Statistical Distributions

A Review of Subscore Estimation Methods. ETS RR-18-17

Peer reviewed
PDF on ERIC

Download full text

Fu, Jianbin; Qu, Yanxuan – ETS Research Report Series, 2018

Various subscore estimation methods that use auxiliary information to improve subscore accuracy and stability have been developed. This report provides a review of various subscore estimation methods described in the literature. The methodology of each method is described, then research studies on these subscore estimation methods are summarized.…

Descriptors: Scores, Evaluation Methods, Item Response Theory, Test Items

Examining the Calibration Process for Raters of the "GRE"® General Test. ETS GRE® Board Research Report. GRE®-19-01. Research Report Series. ETS RR-19-09

Peer reviewed
PDF on ERIC

Download full text

Wendler, Cathy; Glazer, Nancy; Cline, Frederick – ETS Research Report Series, 2019

One of the challenges in scoring constructed-response (CR) items and tasks is ensuring that rater drift does not occur during or across scoring windows. Rater drift reflects changes in how raters interpret and use established scoring criteria to assign essay scores. Calibration is a process used to help control rater drift and, as such, serves as…

Descriptors: College Entrance Examinations, Graduate Study, Accuracy, Test Reliability

The Impact of Aberrant Responses and Detection in Forced-Choice Noncognitive Assessment. Research Report. ETS RR-18-32

Peer reviewed
PDF on ERIC

Download full text

Kim, Sooyeon; Moses, Tim – ETS Research Report Series, 2018

The purpose of this study is to assess the impact of aberrant responses on the estimation accuracy in forced-choice format assessments. To that end, a wide range of aberrant response behaviors (e.g., fake, random, or mechanical responses) affecting upward of 20%--30% of the responses was manipulated under the multi-unidimensional pairwise…

Descriptors: Measurement Techniques, Response Style (Tests), Accuracy, Computation

Does the Time between Scoring Sessions Impact Scoring Accuracy? An Evaluation of Constructed-Response Essay Responses on the "GRE"® General Test. Research Report. ETS RR-18-31

Peer reviewed
PDF on ERIC

Download full text

Finn, Bridgid; Wendler, Cathy; Ricker-Pedley, Kathryn L.; Arslan, Burcu – ETS Research Report Series, 2018

This report investigates whether the time between scoring sessions has an influence on operational and nonoperational scoring accuracy. The study evaluates raters' scoring accuracy on constructed-response essay responses for the "GRE"® General Test. Binomial linear mixed-effect models are presented that evaluate how the effect of various…

Descriptors: Intervals, Scoring, Accuracy, Essay Tests

Benchmark Keystroke Biometrics Accuracy from High-Stakes Writing Tasks. Research Report. ETS RR-21-15

Peer reviewed
PDF on ERIC

Download full text

Choi, Ikkyu; Hao, Jiangang; Deane, Paul; Zhang, Mo – ETS Research Report Series, 2021

"Biometrics" are physical or behavioral human characteristics that can be used to identify a person. It is widely known that keystroke or typing dynamics for short, fixed texts (e.g., passwords) could serve as a behavioral biometric. In this study, we investigate whether keystroke data from essay responses can lead to a reliable…

Descriptors: Accuracy, High Stakes Tests, Writing Tests, Benchmarking

Previous Page | Next Page »

Pages: 1 | 2 | 3 | 4 | 5

Moses, Tim	8
Kim, Sooyeon	7
Haberman, Shelby J.	5
Livingston, Samuel A.	5
Deane, Paul	3
Dorans, Neil J.	3
Guo, Hongwen	3
Holland, Paul	3
Attali, Yigal	2
Chen, Haiwen H.	2
Fu, Jianbin	2
Lee, Jihyun	2
Lee, Yi-Hsuan	2
Lu, Ru	2
Oh, Hyeonjoo J.	2
Puhan, Gautam	2
Stankov, Lazar	2
Wendler, Cathy	2
Yamamoto, Kentaro	2
Zhang, Mo	2
von Davier, Alina A.	2
Ali, Usama S.	1
Arslan, Burcu	1
Bejar, Isaac I.	1
Bell, Courtney	1
More ▼