ERIC - Search Results

Publication Date

In 2025	1
Since 2024	1
Since 2021 (last 5 years)	3
Since 2016 (last 10 years)	3
Since 2006 (last 20 years)	7

Descriptor

Computer Assisted Testing	18
Scores	15
Adaptive Testing	6
Item Response Theory	6
Test Items	6
Automation	5
Mathematics Tests	5
Test Construction	5
Comparative Analysis	4
Higher Education	4
Item Banks	4
Simulation	4
Comparative Testing	3
Test Format	3
Thinking Skills	3
Achievement Tests	2
College Students	2
Computer Simulation	2
Difficulty Level	2
Equated Scores	2
Estimation (Mathematics)	2
Graduate Students	2
Multiple Choice Tests	2
Probability	2
Psychometrics	2
More ▼

Source

Journal of Educational…

Publication Type

Journal Articles	18
Reports - Research	12
Reports - Evaluative	4
Book/Product Reviews	1
Reports - Descriptive	1
Speeches/Meeting Papers	1

Education Level

Higher Education

Audience

Location

Laws, Policies, & Programs

Assessments and Surveys

Graduate Record Examinations	2
Indiana Statewide Testing for…	2

What Works Clearinghouse Rating

Showing 1 to 15 of 18 results Save | Export

Evaluating the Consistency and Reliability of Attribution Methods in Automated Short Answer Grading (ASAG) Systems: Toward an Explainable Scoring System

Peer reviewed

Direct link

Wallace N. Pinto Jr.; Jinnie Shin – Journal of Educational Measurement, 2025

In recent years, the application of explainability techniques to automated essay scoring and automated short-answer grading (ASAG) models, particularly those based on transformer architectures, has gained significant attention. However, the reliability and consistency of these techniques remain underexplored. This study systematically investigates…

Descriptors: Automation, Grading, Computer Assisted Testing, Scoring

Using Linkage Sets to Improve Connectedness in Rater Response Model Estimation

Peer reviewed

Direct link

Casabianca, Jodi M.; Donoghue, John R.; Shin, Hyo Jeong; Chao, Szu-Fu; Choi, Ikkyu – Journal of Educational Measurement, 2023

Using item-response theory to model rater effects provides an alternative solution for rater monitoring and diagnosis, compared to using standard performance metrics. In order to fit such models, the ratings data must be sufficiently connected in order to estimate rater effects. Due to popular rating designs used in large-scale testing scenarios,…

Descriptors: Item Response Theory, Alternative Assessment, Evaluators, Research Problems

Score Comparability between Online Proctored and In-Person Credentialing Exams

Peer reviewed

Direct link

Jones, Paul; Tong, Ye; Liu, Jinghua; Borglum, Joshua; Primoli, Vince – Journal of Educational Measurement, 2022

This article studied two methods to detect mode effects in two credentialing exams. In Study 1, we used a "modal scale comparison approach," where the same pool of items was calibrated separately, without transformation, within two TC cohorts (TC1 and TC2) and one OP cohort (OP1) matched on their pool-based scale score distributions. The…

Descriptors: Scores, Credentials, Licensing Examinations (Professions), Computer Assisted Testing

Multidimensional CAT Item Selection Methods for Domain Scores and Composite Scores with Item Exposure Control and Content Constraints

Peer reviewed

Direct link

Yao, Lihua – Journal of Educational Measurement, 2014

The intent of this research was to find an item selection procedure in the multidimensional computer adaptive testing (CAT) framework that yielded higher precision for both the domain and composite abilities, had a higher usage of the item pool, and controlled the exposure rate. Five multidimensional CAT item selection procedures (minimum angle;…

Descriptors: Computer Assisted Testing, Adaptive Testing, Test Items, Selection

Assessing Individual-Level Impact of Interruptions during Online Testing

Peer reviewed

Direct link

Sinharay, Sandip; Wan, Ping; Choi, Seung W.; Kim, Dong-In – Journal of Educational Measurement, 2015

With an increase in the number of online tests, the number of interruptions during testing due to unexpected technical issues seems to be on the rise. For example, interruptions occurred during several recent state tests. When interruptions occur, it is important to determine the extent of their impact on the examinees' scores. Researchers such as…

Descriptors: Computer Assisted Testing, Testing Problems, Scores, Statistical Analysis

Determining the Overall Impact of Interruptions during Online Testing

Peer reviewed

Direct link

Sinharay, Sandip; Wan, Ping; Whitaker, Mike; Kim, Dong-In; Zhang, Litong; Choi, Seung W. – Journal of Educational Measurement, 2014

With an increase in the number of online tests, interruptions during testing due to unexpected technical issues seem unavoidable. For example, interruptions occurred during several recent state tests. When interruptions occur, it is important to determine the extent of their impact on the examinees' scores. There is a lack of research on this…

Descriptors: Computer Assisted Testing, Testing Problems, Scores, Regression (Statistics)

Local Dependence in an Operational CAT: Diagnosis and Implications

Peer reviewed

Direct link

Pommerich, Mary; Segall, Daniel O. – Journal of Educational Measurement, 2008

The accuracy of CAT scores can be negatively affected by local dependence if the CAT utilizes parameters that are misspecified due to the presence of local dependence and/or fails to control for local dependence in responses during the administration stage. This article evaluates the existence and effect of local dependence in a test of…

Descriptors: Simulation, Computer Assisted Testing, Mathematics Tests, Scores

Multidimensional Equating.

Peer reviewed

Hirsch, Thomas M. – Journal of Educational Measurement, 1989

Equatings were performed on both simulated and real data sets using common-examinee design and two abilities for each examinee. Results indicate that effective equating, as measured by comparability of true scores, is possible with the techniques used in this study. However, the stability of the ability estimates proved unsatisfactory. (TJH)

Descriptors: Academic Ability, College Students, Comparative Analysis, Computer Assisted Testing

Generating Random Parallel Test Forms Using CTT in a Computer-Based Environment.

Peer reviewed

Weiner, John A.; Gibson, Wade M. – Journal of Educational Measurement, 1998

Describes a procedure for automated-test-forms assembly based on Classical Test Theory (CTT). The procedure uses stratified random-content sampling and test-form preequating to ensure both content and psychometric equivalence in generating virtually unlimited parallel forms. Extends the usefulness of CTT in automated test construction. (Author/SLD)

Descriptors: Automation, Computer Assisted Testing, Equated Scores, Psychometrics

Simulating the Use of Disclosed Items in Computerized Adaptive Testing.

Peer reviewed

Stocking, Martha L.; Ward, William C.; Potenza, Maria T. – Journal of Educational Measurement, 1998

Explored, using simulations, the use of disclosed items on continuous testing conditions under a worse-case scenario that assumes that disclosed items are always answered correctly. Some item pool and test designs were identified in which the use of disclosed items produces effects on test scores that may be viewed as negligible. (Author/MAK)

Descriptors: Adaptive Testing, Cheating, Computer Assisted Testing, Item Banks

Effect of Rasch Calibration on Ability and DIF Estimation in Computer-Adaptive Tests.

Peer reviewed

Zwick, Rebecca; And Others – Journal of Educational Measurement, 1995

In a simulation study of ability and estimation of differential item functioning (DIF) in computerized adaptive tests, Rasch-based DIF statistics were highly correlated with generating DIF, but DIF statistics tended to be slightly smaller than in the three-parameter logistic model analyses. (SLD)

Descriptors: Ability, Adaptive Testing, Computer Assisted Testing, Computer Simulation

Moderating Possibly Irrelevant Multiple Mean Score Differences on a Test of Mathematical Reasoning.

Peer reviewed

Stocking, Martha L.; Jirele, Thomas; Lewis, Charles; Swanson, Len – Journal of Educational Measurement, 1998

Constructed a pool of items from operational tests of mathematics to investigate the feasibility of using automated-test-assembly (ATA) methods to moderate simultaneously possibly irrelevant differences between the performance of women and men and African-American and White test takers. Discusses the usefulness of ATA. (SLD)

Descriptors: Automation, Computer Assisted Testing, Item Banks, Mathematics Tests

A Reaction to "Moderating Possibly Irrelevant Multiple Mean Score Differences on a Test of Mathematical Reasoning."

Peer reviewed

Luecht, Richard M. – Journal of Educational Measurement, 1998

Comments on the application of a proposed automated test assembly (ATA) to the problem of reducing potential performance differential among population subgroups and points out some pitfalls. Presents a rejoinder by M. Stocking and others. (SLD)

Descriptors: Automation, Computer Assisted Testing, Item Banks, Mathematics Tests

Effects of Differentially Time-Consuming Tests on Computer-Adaptive Test Scores

Peer reviewed

Direct link

Bridgeman, Brent; Cline, Frederick – Journal of Educational Measurement, 2004

Time limits on some computer-adaptive tests (CATs) are such that many examinees have difficulty finishing, and some examinees may be administered tests with more time-consuming items than others. Results from over 100,000 examinees suggested that about half of the examinees must guess on the final six questions of the analytical section of the…

Descriptors: Guessing (Tests), Timed Tests, Adaptive Testing, Computer Assisted Testing

Using Patterns of Summed Scores in Paper-and-Pencil Tests and Computer-Adaptive Tests to Detect Misfitting Item Score Patterns

Peer reviewed

Direct link

Meijer, Rob R. – Journal of Educational Measurement, 2004

Two new methods have been proposed to determine unexpected sum scores on sub-tests (testlets) both for paper-and-pencil tests and computer adaptive tests. A method based on a conservative bound using the hypergeometric distribution, denoted p, was compared with a method where the probability for each score combination was calculated using a…

Descriptors: Probability, Adaptive Testing, Item Response Theory, Scores

Previous Page | Next Page »

Pages: 1 | 2

Bridgeman, Brent	2
Choi, Seung W.	2
Kim, Dong-In	2
Sinharay, Sandip	2
Stocking, Martha L.	2
Wan, Ping	2
Bennett, Randy Elliot	1
Borglum, Joshua	1
Casabianca, Jodi M.	1
Chao, Szu-Fu	1
Choi, Ikkyu	1
Cline, Frederick	1
Donoghue, John R.	1
Gibson, Wade M.	1
Hirsch, Thomas M.	1
Jinnie Shin	1
Jirele, Thomas	1
Jones, Paul	1
Katz, Irvin R.	1
Lewis, Charles	1
Liu, Jinghua	1
Luecht, Richard M.	1
Meijer, Rob R.	1
Morley, Mary	1
More ▼