ERIC - Search Results

Publication Date

In 2025	1
Since 2024	1
Since 2021 (last 5 years)	3
Since 2016 (last 10 years)	3
Since 2006 (last 20 years)	7

Source

Journal of Educational…

Publication Type

Journal Articles	15
Reports - Research	11
Reports - Evaluative	3
Book/Product Reviews	1

Education Level

Higher Education

Audience

Location

Laws, Policies, & Programs

Assessments and Surveys

Graduate Record Examinations	2
Indiana Statewide Testing for…	2

What Works Clearinghouse Rating

Showing all 15 results Save | Export

Evaluating the Consistency and Reliability of Attribution Methods in Automated Short Answer Grading (ASAG) Systems: Toward an Explainable Scoring System

Peer reviewed

Direct link

Wallace N. Pinto Jr.; Jinnie Shin – Journal of Educational Measurement, 2025

In recent years, the application of explainability techniques to automated essay scoring and automated short-answer grading (ASAG) models, particularly those based on transformer architectures, has gained significant attention. However, the reliability and consistency of these techniques remain underexplored. This study systematically investigates…

Descriptors: Automation, Grading, Computer Assisted Testing, Scoring

Using Linkage Sets to Improve Connectedness in Rater Response Model Estimation

Peer reviewed

Direct link

Casabianca, Jodi M.; Donoghue, John R.; Shin, Hyo Jeong; Chao, Szu-Fu; Choi, Ikkyu – Journal of Educational Measurement, 2023

Using item-response theory to model rater effects provides an alternative solution for rater monitoring and diagnosis, compared to using standard performance metrics. In order to fit such models, the ratings data must be sufficiently connected in order to estimate rater effects. Due to popular rating designs used in large-scale testing scenarios,…

Descriptors: Item Response Theory, Alternative Assessment, Evaluators, Research Problems

Score Comparability between Online Proctored and In-Person Credentialing Exams

Peer reviewed

Direct link

Jones, Paul; Tong, Ye; Liu, Jinghua; Borglum, Joshua; Primoli, Vince – Journal of Educational Measurement, 2022

This article studied two methods to detect mode effects in two credentialing exams. In Study 1, we used a "modal scale comparison approach," where the same pool of items was calibrated separately, without transformation, within two TC cohorts (TC1 and TC2) and one OP cohort (OP1) matched on their pool-based scale score distributions. The…

Descriptors: Scores, Credentials, Licensing Examinations (Professions), Computer Assisted Testing

Multidimensional CAT Item Selection Methods for Domain Scores and Composite Scores with Item Exposure Control and Content Constraints

Peer reviewed

Direct link

Yao, Lihua – Journal of Educational Measurement, 2014

The intent of this research was to find an item selection procedure in the multidimensional computer adaptive testing (CAT) framework that yielded higher precision for both the domain and composite abilities, had a higher usage of the item pool, and controlled the exposure rate. Five multidimensional CAT item selection procedures (minimum angle;…

Descriptors: Computer Assisted Testing, Adaptive Testing, Test Items, Selection

Assessing Individual-Level Impact of Interruptions during Online Testing

Peer reviewed

Direct link

Sinharay, Sandip; Wan, Ping; Choi, Seung W.; Kim, Dong-In – Journal of Educational Measurement, 2015

With an increase in the number of online tests, the number of interruptions during testing due to unexpected technical issues seems to be on the rise. For example, interruptions occurred during several recent state tests. When interruptions occur, it is important to determine the extent of their impact on the examinees' scores. Researchers such as…

Descriptors: Computer Assisted Testing, Testing Problems, Scores, Statistical Analysis

Determining the Overall Impact of Interruptions during Online Testing

Peer reviewed

Direct link

Sinharay, Sandip; Wan, Ping; Whitaker, Mike; Kim, Dong-In; Zhang, Litong; Choi, Seung W. – Journal of Educational Measurement, 2014

With an increase in the number of online tests, interruptions during testing due to unexpected technical issues seem unavoidable. For example, interruptions occurred during several recent state tests. When interruptions occur, it is important to determine the extent of their impact on the examinees' scores. There is a lack of research on this…

Descriptors: Computer Assisted Testing, Testing Problems, Scores, Regression (Statistics)

Local Dependence in an Operational CAT: Diagnosis and Implications

Peer reviewed

Direct link

Pommerich, Mary; Segall, Daniel O. – Journal of Educational Measurement, 2008

The accuracy of CAT scores can be negatively affected by local dependence if the CAT utilizes parameters that are misspecified due to the presence of local dependence and/or fails to control for local dependence in responses during the administration stage. This article evaluates the existence and effect of local dependence in a test of…

Descriptors: Simulation, Computer Assisted Testing, Mathematics Tests, Scores

Simulating the Use of Disclosed Items in Computerized Adaptive Testing.

Peer reviewed

Stocking, Martha L.; Ward, William C.; Potenza, Maria T. – Journal of Educational Measurement, 1998

Explored, using simulations, the use of disclosed items on continuous testing conditions under a worse-case scenario that assumes that disclosed items are always answered correctly. Some item pool and test designs were identified in which the use of disclosed items produces effects on test scores that may be viewed as negligible. (Author/MAK)

Descriptors: Adaptive Testing, Cheating, Computer Assisted Testing, Item Banks

Moderating Possibly Irrelevant Multiple Mean Score Differences on a Test of Mathematical Reasoning.

Peer reviewed

Stocking, Martha L.; Jirele, Thomas; Lewis, Charles; Swanson, Len – Journal of Educational Measurement, 1998

Constructed a pool of items from operational tests of mathematics to investigate the feasibility of using automated-test-assembly (ATA) methods to moderate simultaneously possibly irrelevant differences between the performance of women and men and African-American and White test takers. Discusses the usefulness of ATA. (SLD)

Descriptors: Automation, Computer Assisted Testing, Item Banks, Mathematics Tests

A Reaction to "Moderating Possibly Irrelevant Multiple Mean Score Differences on a Test of Mathematical Reasoning."

Peer reviewed

Luecht, Richard M. – Journal of Educational Measurement, 1998

Comments on the application of a proposed automated test assembly (ATA) to the problem of reducing potential performance differential among population subgroups and points out some pitfalls. Presents a rejoinder by M. Stocking and others. (SLD)

Descriptors: Automation, Computer Assisted Testing, Item Banks, Mathematics Tests

Effects of Differentially Time-Consuming Tests on Computer-Adaptive Test Scores

Peer reviewed

Direct link

Bridgeman, Brent; Cline, Frederick – Journal of Educational Measurement, 2004

Time limits on some computer-adaptive tests (CATs) are such that many examinees have difficulty finishing, and some examinees may be administered tests with more time-consuming items than others. Results from over 100,000 examinees suggested that about half of the examinees must guess on the final six questions of the analytical section of the…

Descriptors: Guessing (Tests), Timed Tests, Adaptive Testing, Computer Assisted Testing

Using Patterns of Summed Scores in Paper-and-Pencil Tests and Computer-Adaptive Tests to Detect Misfitting Item Score Patterns

Peer reviewed

Direct link

Meijer, Rob R. – Journal of Educational Measurement, 2004

Two new methods have been proposed to determine unexpected sum scores on sub-tests (testlets) both for paper-and-pencil tests and computer adaptive tests. A method based on a conservative bound using the hypergeometric distribution, denoted p, was compared with a method where the probability for each score combination was calculated using a…

Descriptors: Probability, Adaptive Testing, Item Response Theory, Scores

Psychometric and Cognitive Functioning of an Under-Determined Computer-Based Response Type for Quantitative Reasoning.

Peer reviewed

Bennett, Randy Elliot; Morley, Mary; Quardt, Dennis; Rock, Donald A.; Singley, Mark K.; Katz, Irvin R.; Nhouyvanisvong, Adisack – Journal of Educational Measurement, 1999

Evaluated a computer-delivered response type for measuring quantitative skill, the "Generating Examples" (GE) response type, which presents under-determined problems that can have many right answers. Results from 257 graduate students and applicants indicate that GE scores are reasonably reliable, but only moderately related to Graduate…

Descriptors: College Applicants, Computer Assisted Testing, Graduate Students, Graduate Study

A Comparison of Quantitative Questions in Open-Ended and Multiple-Choice Formats.

Peer reviewed

Bridgeman, Brent – Journal of Educational Measurement, 1992

Examinees in a regular administration of the quantitative portion of the Graduate Record Examination responded to particular items in a machine-scannable multiple-choice format. Volunteers (n=364) used a computer to answer open-ended counterparts of these items. Scores for both formats demonstrated similar correlational patterns. (SLD)

Descriptors: Answer Sheets, College Entrance Examinations, College Students, Comparative Testing

A Comparison of Self-Adapted and Computerized Adaptive Tests.

Peer reviewed

Wise, Steven L.; And Others – Journal of Educational Measurement, 1992

Performance of 156 undergraduate and 48 graduate students on a self-adapted test (SFAT)--students choose the difficulty level of their test items--was compared with performance on a computer-adapted test (CAT). Those taking the SFAT obtained higher ability scores and reported lower posttest state anxiety than did CAT takers. (SLD)

Descriptors: Adaptive Testing, Comparative Testing, Computer Assisted Testing, Difficulty Level

Computer Assisted Testing	15
Scores	15
Test Items	6
Adaptive Testing	5
Item Response Theory	5
Mathematics Tests	5
Automation	4
Item Banks	4
Simulation	4
Test Construction	4
Comparative Analysis	3
Comparative Testing	3
Higher Education	3
Test Format	3
Thinking Skills	3
Achievement Tests	2
Difficulty Level	2
Graduate Students	2
Multiple Choice Tests	2
Probability	2
Racial Differences	2
Regression (Statistics)	2
Scoring	2
Sex Differences	2
Statistical Analysis	2
More ▼

Bridgeman, Brent	2
Choi, Seung W.	2
Kim, Dong-In	2
Sinharay, Sandip	2
Stocking, Martha L.	2
Wan, Ping	2
Bennett, Randy Elliot	1
Borglum, Joshua	1
Casabianca, Jodi M.	1
Chao, Szu-Fu	1
Choi, Ikkyu	1
Cline, Frederick	1
Donoghue, John R.	1
Jinnie Shin	1
Jirele, Thomas	1
Jones, Paul	1
Katz, Irvin R.	1
Lewis, Charles	1
Liu, Jinghua	1
Luecht, Richard M.	1
Meijer, Rob R.	1
Morley, Mary	1
Nhouyvanisvong, Adisack	1
Pommerich, Mary	1
More ▼