ERIC - Search Results

Publication Date

In 2025	0
Since 2024	0
Since 2021 (last 5 years)	3
Since 2016 (last 10 years)	11
Since 2006 (last 20 years)	25

Descriptor

Comparative Analysis	47
Computer Assisted Testing	20
Test Items	15
Adaptive Testing	14
Testing	14
Item Response Theory	11
Simulation	10
Scores	9
Scoring	8
Test Format	8
Statistical Analysis	7
Test Construction	7
Testing Problems	7
Difficulty Level	6
Equated Scores	6
Selection	5
Test Bias	5
Test Validity	5
Achievement Tests	4
Computation	4
Computer Simulation	4
Error of Measurement	4
Evaluation Methods	4
Hypothesis Testing	4
Measurement	4
More ▼

Source

Journal of Educational…

Publication Type

Journal Articles	40
Reports - Research	25
Reports - Evaluative	10
Reports - Descriptive	4
Information Analyses	1
Speeches/Meeting Papers	1

Education Level

Audience

Location

Canada

Laws, Policies, & Programs

Assessments and Surveys

Indiana Statewide Testing for…	1
North Carolina End of Course…	1
SAT (College Admission Test)	1
Wechsler Adult Intelligence…	1
Wechsler Intelligence Scale…	1

What Works Clearinghouse Rating

Showing 1 to 15 of 47 results Save | Export

Score Comparability between Online Proctored and In-Person Credentialing Exams

Peer reviewed

Direct link

Jones, Paul; Tong, Ye; Liu, Jinghua; Borglum, Joshua; Primoli, Vince – Journal of Educational Measurement, 2022

This article studied two methods to detect mode effects in two credentialing exams. In Study 1, we used a "modal scale comparison approach," where the same pool of items was calibrated separately, without transformation, within two TC cohorts (TC1 and TC2) and one OP cohort (OP1) matched on their pool-based scale score distributions. The…

Descriptors: Scores, Credentials, Licensing Examinations (Professions), Computer Assisted Testing

Historical Perspectives on Score Comparability Issues Raised by Innovations in Testing

Peer reviewed

Direct link

Baldwin, Peter; Clauser, Brian E. – Journal of Educational Measurement, 2022

While score comparability across test forms typically relies on common (or randomly equivalent) examinees or items, innovations in item formats, test delivery, and efforts to extend the range of score interpretation may require a special data collection before examinees or items can be used in this way--or may be incompatible with common examinee…

Descriptors: Scoring, Testing, Test Items, Test Format

On-the-Fly Constraint-Controlled Assembly Methods for Multistage Adaptive Testing for Cognitive Diagnosis

Peer reviewed

Direct link

Liu, Shuchang; Cai, Yan; Tu, Dongbo – Journal of Educational Measurement, 2018

This study applied the mode of on-the-fly assembled multistage adaptive testing to cognitive diagnosis (CD-OMST). Several and several module assembly methods for CD-OMST were proposed and compared in terms of measurement precision, test security, and constrain management. The module assembly methods in the study included the maximum priority index…

Descriptors: Adaptive Testing, Monte Carlo Methods, Computer Security, Clinical Diagnosis

Score Comparability Issues with At-Home Testing and How to Address Them

Peer reviewed

Direct link

Puhan, Gautam; Kim, Sooyeon – Journal of Educational Measurement, 2022

As a result of the COVID-19 pandemic, at-home testing has become a popular delivery mode in many testing programs. When programs offer at-home testing to expand their service, the score comparability between test takers testing remotely and those testing in a test center is critical. This article summarizes statistical procedures that could be…

Descriptors: Scores, Scoring, Comparative Analysis, Testing

How to Compare Parametric and Nonparametric Person-Fit Statistics Using Real Data

Peer reviewed

Direct link

Sinharay, Sandip – Journal of Educational Measurement, 2017

Person-fit assessment (PFA) is concerned with uncovering atypical test performance as reflected in the pattern of scores on individual items on a test. Existing person-fit statistics (PFSs) include both parametric and nonparametric statistics. Comparison of PFSs has been a popular research topic in PFA, but almost all comparisons have employed…

Descriptors: Goodness of Fit, Testing, Test Items, Scores

Monitoring Items in Real Time to Enhance CAT Security

Peer reviewed

Direct link

Zhang, Jinming; Li, Jie – Journal of Educational Measurement, 2016

An IRT-based sequential procedure is developed to monitor items for enhancing test security. The procedure uses a series of statistical hypothesis tests to examine whether the statistical characteristics of each item under inspection have changed significantly during CAT administration. This procedure is compared with a previously developed…

Descriptors: Computer Assisted Testing, Test Items, Difficulty Level, Item Response Theory

On the Issue of Item Selection in Computerized Adaptive Testing with Response Times

Peer reviewed

Direct link

Veldkamp, Bernard P. – Journal of Educational Measurement, 2016

Many standardized tests are now administered via computer rather than paper-and-pencil format. The computer-based delivery mode brings with it certain advantages. One advantage is the ability to adapt the difficulty level of the test to the ability level of the test taker in what has been termed computerized adaptive testing (CAT). A second…

Descriptors: Computer Assisted Testing, Reaction Time, Standardized Tests, Difficulty Level

Aggregating Polytomous DIF Results over Multiple Test Administrations

Peer reviewed

Direct link

Zwick, Rebecca; Ye, Lei; Isham, Steven – Journal of Educational Measurement, 2018

In typical differential item functioning (DIF) assessments, an item's DIF status is not influenced by its status in previous test administrations. An item that has shown DIF at multiple administrations may be treated the same way as an item that has shown DIF in only the most recent administration. Therefore, much useful information about the…

Descriptors: Test Bias, Testing, Test Items, Bayesian Statistics

A Top-Down Approach to Designing the Computerized Adaptive Multistage Test

Peer reviewed

Direct link

Luo, Xiao; Kim, Doyoung – Journal of Educational Measurement, 2018

The top-down approach to designing a multistage test is relatively understudied in the literature and underused in research and practice. This study introduced a route-based top-down design approach that directly sets design parameters at the test level and utilizes the advanced automated test assembly algorithm seeking global optimality. The…

Descriptors: Computer Assisted Testing, Test Construction, Decision Making, Simulation

A Comparison of Constraint Programming and Mixed-Integer Programming for Automated Test-Form Generation

Peer reviewed

Direct link

Li, Jie; van der Linden, Wim J. – Journal of Educational Measurement, 2018

The final step of the typical process of developing educational and psychological tests is to place the selected test items in a formatted form. The step involves the grouping and ordering of the items to meet a variety of formatting constraints. As this activity tends to be time-intensive, the use of mixed-integer programming (MIP) has been…

Descriptors: Programming, Automation, Test Items, Test Format

Assessing Individual-Level Impact of Interruptions during Online Testing

Peer reviewed

Direct link

Sinharay, Sandip; Wan, Ping; Choi, Seung W.; Kim, Dong-In – Journal of Educational Measurement, 2015

With an increase in the number of online tests, the number of interruptions during testing due to unexpected technical issues seems to be on the rise. For example, interruptions occurred during several recent state tests. When interruptions occur, it is important to determine the extent of their impact on the examinees' scores. Researchers such as…

Descriptors: Computer Assisted Testing, Testing Problems, Scores, Statistical Analysis

Classroom Assessment and Large-Scale Psychometrics: Shall the Twain Meet? (A Conversation with Margaret Heritage and Neal Kingston)

Peer reviewed

Direct link

Heritage, Margaret; Kingston, Neal M. – Journal of Educational Measurement, 2019

Classroom assessment and large-scale assessment have, for the most part, existed in mutual isolation. Some experts have felt this is for the best and others have been concerned that the schism limits the potential contribution of both forms of assessment. Margaret Heritage has long been a champion of best practices in classroom assessment. Neal…

Descriptors: Measurement, Psychometrics, Context Effect, Classroom Environment

Multidimensional CAT Item Selection Methods for Domain Scores and Composite Scores with Item Exposure Control and Content Constraints

Peer reviewed

Direct link

Yao, Lihua – Journal of Educational Measurement, 2014

The intent of this research was to find an item selection procedure in the multidimensional computer adaptive testing (CAT) framework that yielded higher precision for both the domain and composite abilities, had a higher usage of the item pool, and controlled the exposure rate. Five multidimensional CAT item selection procedures (minimum angle;…

Descriptors: Computer Assisted Testing, Adaptive Testing, Test Items, Selection

The Impact of Anonymization for Automated Essay Scoring

Peer reviewed

Direct link

Shermis, Mark D.; Lottridge, Sue; Mayfield, Elijah – Journal of Educational Measurement, 2015

This study investigated the impact of anonymizing text on predicted scores made by two kinds of automated scoring engines: one that incorporates elements of natural language processing (NLP) and one that does not. Eight data sets (N = 22,029) were used to form both training and test sets in which the scoring engines had access to both text and…

Descriptors: Scoring, Essays, Computer Assisted Testing, Natural Language Processing

Attribute-Level and Pattern-Level Classification Consistency and Accuracy Indices for Cognitive Diagnostic Assessment

Peer reviewed

Direct link

Wang, Wenyi; Song, Lihong; Chen, Ping; Meng, Yaru; Ding, Shuliang – Journal of Educational Measurement, 2015

Classification consistency and accuracy are viewed as important indicators for evaluating the reliability and validity of classification results in cognitive diagnostic assessment (CDA). Pattern-level classification consistency and accuracy indices were introduced by Cui, Gierl, and Chang. However, the indices at the attribute level have not yet…

Descriptors: Classification, Reliability, Accuracy, Cognitive Tests

Previous Page | Next Page »

Pages: 1 | 2 | 3 | 4

Puhan, Gautam	3
Chang, Hua-Hua	2
Clauser, Brian E.	2
Hakstian, A. Ralph	2
Kansup, Wanlop	2
Kim, Sooyeon	2
Li, Jie	2
Sinharay, Sandip	2
Al-Karni, Ali	1
Ansley, Timothy	1
Baker, Frank B.	1
Baldwin, Peter	1
Bejar, Isaac I.	1
Borglum, Joshua	1
Cai, Yan	1
Chen, Ping	1
Chen, Shu-Ying	1
Choi, Seung W.	1
Clyman, Stephen G.	1
De Ayala, R. J.	1
Deng, Hui	1
Ding, Shuliang	1
Emons, Wilco H. M.	1
Feldt, Leonard S.	1
More ▼