ERIC - Search Results

Publication Date

In 2026	0
Since 2025	1
Since 2022 (last 5 years)	3
Since 2017 (last 10 years)	5
Since 2007 (last 20 years)	6

Descriptor

Computer Assisted Testing	8
Error of Measurement	8
Evaluation Methods	8
Adaptive Testing	4
Test Items	4
Item Banks	3
Comparative Testing	2
Equated Scores	2
Evaluation Criteria	2
Foreign Countries	2
Item Response Theory	2
Reliability	2
Scores	2
Scoring	2
Student Evaluation	2
Test Construction	2
Alternative Assessment	1
Artificial Intelligence	1
Bias	1
College Faculty	1
College Instruction	1
College Mathematics	1
College Students	1
Comparative Analysis	1
Computation	1
More ▼

Source

Applied Psychological…	1
Assessment & Evaluation in…	1
British Educational Research…	1
ETS Research Institute	1
ETS Research Report Series	1
Educational Testing Service	1
Educational and Psychological…	1
International Journal of…	1

Publication Type

Journal Articles	6
Reports - Research	5
Reports - Evaluative	3
Tests/Questionnaires	1

Education Level

Higher Education	2
Postsecondary Education	2
Junior High Schools	1
Middle Schools	1
Secondary Education	1

Audience

Location

Portugal	1
Turkey	1

Laws, Policies, & Programs

Assessments and Surveys

What Works Clearinghouse Rating

Showing all 8 results Save | Export

Grading Exams Using Large Language Models: A Comparison between Human and AI Grading of Exams in Higher Education Using ChatGPT

Peer reviewed

Direct link

Jonas Flodén – British Educational Research Journal, 2025

This study compares how the generative AI (GenAI) large language model (LLM) ChatGPT performs in grading university exams compared to human teachers. Aspects investigated include consistency, large discrepancies and length of answer. Implications for higher education, including the role of teachers and ethics, are also discussed. Three…

Descriptors: College Faculty, Artificial Intelligence, Comparative Testing, Scoring

Comparison of Kernel Equating Methods under NEAT and NEC Designs

Peer reviewed
PDF on ERIC

Download full text

Ozsoy, Seyma Nur; Kilmen, Sevilay – International Journal of Assessment Tools in Education, 2023

In this study, Kernel test equating methods were compared under NEAT and NEC designs. In NEAT design, Kernel post-stratification and chain equating methods taking into account optimal and large bandwidths were compared. In the NEC design, gender and/or computer/tablet use was considered as a covariate, and Kernel test equating methods were…

Descriptors: Equated Scores, Testing, Test Items, Statistical Analysis

Quality of Item Pool (QIP) Index: A Novel Approach to Evaluating CAT Item Pool Adequacy

Peer reviewed

Direct link

Gönülates, Emre – Educational and Psychological Measurement, 2019

This article introduces the Quality of Item Pool (QIP) Index, a novel approach to quantifying the adequacy of an item pool of a computerized adaptive test for a given set of test specifications and examinee population. This index ranges from 0 to 1, with values close to 1 indicating the item pool presents optimum items to examinees throughout the…

Descriptors: Item Banks, Adaptive Testing, Computer Assisted Testing, Error of Measurement

Charting the Future of Assessments. Full Report

Download full text

Patrick C. Kyllonen; Amit Sevak; Teresa Ober; Ikkyu Choi; Jesse Sparks; Daniel Fishtein – ETS Research Institute, 2024

Assessment refers to a broad array of approaches for measuring or evaluating a person's (or group of persons') skills, behaviors, dispositions, or other attributes. Assessments range from standardized tests used in admissions, employee selection, licensure examinations, and domestic and international largescale assessments of cognitive and…

Descriptors: Performance Based Assessment, Evaluation Criteria, Evaluation Methods, Test Bias

A Modified "a"-Stratified Method for Computerized Adaptive Testing. Research Report. ETS RR-19-10

Peer reviewed
PDF on ERIC

Download full text

Gu, Lixiong; Ling, Guangming; Qu, Yanxuan – ETS Research Report Series, 2019

Research has found that the "a"-stratified item selection strategy (STR) for computerized adaptive tests (CATs) may lead to insufficient use of high a items at later stages of the tests and thus to reduced measurement precision. A refined approach, unequal item selection across strata (USTR), effectively improves test precision over the…

Descriptors: Computer Assisted Testing, Adaptive Testing, Test Use, Test Items

E-Assessment within the Bologna Paradigm: Evidence from Portugal

Peer reviewed

Direct link

Ferrao, Maria – Assessment & Evaluation in Higher Education, 2010

The Bologna Declaration brought reforms into higher education that imply changes in teaching methods, didactic materials and textbooks, infrastructures and laboratories, etc. Statistics and mathematics are disciplines that traditionally have the worst success rates, particularly in non-mathematics core curricula courses. This research project,…

Descriptors: Foreign Countries, Computer Assisted Testing, Educational Technology, Educational Assessment

Equating Scores from Adaptive to Linear Tests

Peer reviewed

Direct link

van der Linden, Wim J. – Applied Psychological Measurement, 2006

Two local methods for observed-score equating are applied to the problem of equating an adaptive test to a linear test. In an empirical study, the methods were evaluated against a method based on the test characteristic function (TCF) of the linear test and traditional equipercentile equating applied to the ability estimates on the adaptive test…

Descriptors: Adaptive Testing, Computer Assisted Testing, Test Format, Equated Scores

Tolerable Variation in Item Parameter Estimates for Linear and Adaptive Computer-Based Testing. Research Report No. 04-28

Download full text

Rizavi, Saba; Way, Walter D.; Davey, Tim; Herbert, Erin – Educational Testing Service, 2004

Item parameter estimates vary for a variety of reasons, including estimation error, characteristics of the examinee samples, and context effects (e.g., item location effects, section location effects, etc.). Although we expect variation based on theory, there is reason to believe that observed variation in item parameter estimates exceeds what…

Descriptors: Adaptive Testing, Test Items, Computation, Context Effect

Amit Sevak	1
Daniel Fishtein	1
Davey, Tim	1
Ferrao, Maria	1
Gu, Lixiong	1
Gönülates, Emre	1
Herbert, Erin	1
Ikkyu Choi	1
Jesse Sparks	1
Jonas Flodén	1
Kilmen, Sevilay	1
Ling, Guangming	1
Ozsoy, Seyma Nur	1
Patrick C. Kyllonen	1
Qu, Yanxuan	1
Rizavi, Saba	1
Teresa Ober	1
Way, Walter D.	1
van der Linden, Wim J.	1
More ▼