ERIC - Search Results

Publication Date

In 2025	1
Since 2024	1
Since 2021 (last 5 years)	2
Since 2016 (last 10 years)	3
Since 2006 (last 20 years)	8

Descriptor

Computer Assisted Testing	10
Error of Measurement	10
Reliability	10
Comparative Analysis	4
Item Response Theory	4
Scoring	3
Accuracy	2
Adaptive Testing	2
College Students	2
Correlation	2
Elementary School Students	2
Evaluation Methods	2
Foreign Countries	2
Measurement Techniques	2
Psychometrics	2
Sample Size	2
Simulation	2
Test Items	2
Validity	2
Ability	1
Alternative Assessment	1
Artificial Intelligence	1
Automation	1
Benchmarking	1
Bias	1
More ▼

Source

Assessment	1
Assessment & Evaluation in…	1
British Educational Research…	1
CALICO Journal	1
ETS Research Report Series	1
Educational and Psychological…	1
International Journal of…	1
Journal of Psychoeducational…	1

Publication Type

Journal Articles	8
Reports - Research	6
Reports - Evaluative	4
Reports - Descriptive	1
Speeches/Meeting Papers	1
Tests/Questionnaires	1

Education Level

Higher Education	4
Postsecondary Education	3
Adult Education	1
Elementary Education	1
High School Equivalency…	1
High Schools	1
Secondary Education	1

Audience

Location

China	1
Portugal	1

Laws, Policies, & Programs

Assessments and Surveys

National Household Education…

What Works Clearinghouse Rating

Showing all 10 results Save | Export

Grading Exams Using Large Language Models: A Comparison between Human and AI Grading of Exams in Higher Education Using ChatGPT

Peer reviewed

Direct link

Jonas Flodén – British Educational Research Journal, 2025

This study compares how the generative AI (GenAI) large language model (LLM) ChatGPT performs in grading university exams compared to human teachers. Aspects investigated include consistency, large discrepancies and length of answer. Implications for higher education, including the role of teachers and ethics, are also discussed. Three…

Descriptors: College Faculty, Artificial Intelligence, Comparative Testing, Scoring

Benchmark Keystroke Biometrics Accuracy from High-Stakes Writing Tasks. Research Report. ETS RR-21-15

Peer reviewed
PDF on ERIC

Download full text

Choi, Ikkyu; Hao, Jiangang; Deane, Paul; Zhang, Mo – ETS Research Report Series, 2021

"Biometrics" are physical or behavioral human characteristics that can be used to identify a person. It is widely known that keystroke or typing dynamics for short, fixed texts (e.g., passwords) could serve as a behavioral biometric. In this study, we investigate whether keystroke data from essay responses can lead to a reliable…

Descriptors: Accuracy, High Stakes Tests, Writing Tests, Benchmarking

Internet Administration of the Paper-and-Pencil Gifted Rating Scale: Assessing Psychometric Equivalence

Peer reviewed

Direct link

Yarnell, Jordy B.; Pfeiffer, Steven I. – Journal of Psychoeducational Assessment, 2015

The present study examined the psychometric equivalence of administering a computer-based version of the Gifted Rating Scale (GRS) compared with the traditional paper-and-pencil GRS-School Form (GRS-S). The GRS-S is a teacher-completed rating scale used in gifted assessment. The GRS-Electronic Form provides an alternative method of administering…

Descriptors: Gifted, Psychometrics, Rating Scales, Computer Assisted Testing

Investigating the Application of Automated Writing Evaluation to Chinese Undergraduate English Majors: A Case Study of "WriteToLearn"

Peer reviewed
PDF on ERIC

Download full text

Liu, Sha; Kunnan, Antony John – CALICO Journal, 2016

This study investigated the application of "WriteToLearn" on Chinese undergraduate English majors' essays in terms of its scoring ability and the accuracy of its error feedback. Participants were 163 second-year English majors from a university located in Sichuan province who wrote 326 essays from two writing prompts. Each paper was…

Descriptors: Foreign Countries, Undergraduate Students, English (Second Language), Second Language Learning

The Value of Item Response Theory in Clinical Assessment: A Review

Peer reviewed

Direct link

Thomas, Michael L. – Assessment, 2011

Item response theory (IRT) and related latent variable models represent modern psychometric theory, the successor to classical test theory in psychological assessment. Although IRT has become prevalent in the measurement of ability and achievement, its contributions to clinical domains have been less extensive. Applications of IRT to clinical…

Descriptors: Item Response Theory, Psychological Evaluation, Reliability, Error of Measurement

A Monte Carlo Simulation Investigating the Validity and Reliability of Ability Estimation in Item Response Theory with Speeded Computer Adaptive Tests

Peer reviewed

Direct link

Schmitt, T. A.; Sass, D. A.; Sullivan, J. R.; Walker, C. M. – International Journal of Testing, 2010

Imposed time limits on computer adaptive tests (CATs) can result in examinees having difficulty completing all items, thus compromising the validity and reliability of ability estimates. In this study, the effects of speededness were explored in a simulated CAT environment by varying examinee response patterns to end-of-test items. Expectedly,…

Descriptors: Monte Carlo Methods, Simulation, Computer Assisted Testing, Adaptive Testing

E-Assessment within the Bologna Paradigm: Evidence from Portugal

Peer reviewed

Direct link

Ferrao, Maria – Assessment & Evaluation in Higher Education, 2010

The Bologna Declaration brought reforms into higher education that imply changes in teaching methods, didactic materials and textbooks, infrastructures and laboratories, etc. Statistics and mathematics are disciplines that traditionally have the worst success rates, particularly in non-mathematics core curricula courses. This research project,…

Descriptors: Foreign Countries, Computer Assisted Testing, Educational Technology, Educational Assessment

Measuring General Self-Efficacy: A Comparison of Three Measures Using Item Response Theory

Peer reviewed

Direct link

Scherbaum, Charles A.; Cohen-Charash, Yochi; Kern, Michael J. – Educational and Psychological Measurement, 2006

General self-efficacy (GSE), individuals' belief in their ability to perform well in a variety of situations, has been the subject of increasing research attention. However, the psychometric properties (e.g., reliability, validity) associated with the scores on GSE measures have been criticized, which has hindered efforts to further establish the…

Descriptors: Self Efficacy, Measures (Individuals), Psychometrics, Reliability

Item Characteristic Curve Parameters: Effects of Sample Size on Linear Equating.

Download full text

Ree, Malcom James; Jensen, Harald E. – 1980

By means of computer simulation of test responses, the reliability of item analysis data and the accuracy of equating were examined for hypothetical samples of 250, 500, 1000, and 2000 subjects for two tests with 20 equating items plus 60 additional items on the same scale. Birnbaum's three-parameter logistic model was used for the simulation. The…

Descriptors: Computer Assisted Testing, Equated Scores, Error of Measurement, Item Analysis

Reinterview Program for the 1991 National Household Education Survey.

Download full text

Brick, J. Michael; West, Jerry – 1992

In the spring of 1991 the first full-scale National Household Education Survey (NHES:91) was conducted for the National Center for Education Statistics (NCES). The NHES:91 was a national random digit dial telephone survey of about 14,000 parents of 3- to 8-year-old children concerning the educational experiences of young children. A reinterview…

Descriptors: Computer Assisted Testing, Early Childhood Education, Educational Attitudes, Educational Experience

Brick, J. Michael	1
Choi, Ikkyu	1
Cohen-Charash, Yochi	1
Deane, Paul	1
Ferrao, Maria	1
Hao, Jiangang	1
Jensen, Harald E.	1
Jonas Flodén	1
Kern, Michael J.	1
Kunnan, Antony John	1
Liu, Sha	1
Pfeiffer, Steven I.	1
Ree, Malcom James	1
Sass, D. A.	1
Scherbaum, Charles A.	1
Schmitt, T. A.	1
Sullivan, J. R.	1
Thomas, Michael L.	1
Walker, C. M.	1
West, Jerry	1
Yarnell, Jordy B.	1
Zhang, Mo	1
More ▼