ERIC - Search Results

Publication Date

In 2025	2
Since 2024	5
Since 2021 (last 5 years)	11
Since 2016 (last 10 years)	26
Since 2006 (last 20 years)	34

Descriptor

Test Construction	201
Test Items	84
Test Validity	46
Test Reliability	45
Computer Assisted Testing	33
Item Analysis	32
Item Response Theory	27
Higher Education	25
Multiple Choice Tests	25
Test Format	22
Measurement Techniques	21
Adaptive Testing	20
Achievement Tests	19
Comparative Analysis	19
Models	18
Psychometrics	18
Scoring	18
Criterion Referenced Tests	17
Item Banks	17
Scores	16
Mathematical Models	15
Difficulty Level	14
Simulation	14
Test Bias	14
Test Interpretation	14
More ▼

Source

Journal of Educational…

201

Publication Type

Journal Articles	146
Reports - Research	82
Reports - Evaluative	38
Reports - Descriptive	15
Book/Product Reviews	8
Speeches/Meeting Papers	7
Information Analyses	6
Opinion Papers	3
Tests/Questionnaires	1

Education Level

Higher Education	4
Postsecondary Education	4
Elementary Secondary Education	2
Secondary Education	1

Audience

Researchers

Location

Australia	2
Israel	2
France	1
Netherlands	1
Spain	1
United Kingdom	1
United States	1

Laws, Policies, & Programs

Assessments and Surveys

National Assessment of…	6
Graduate Record Examinations	4
SAT (College Admission Test)	3
Advanced Placement…	2
Peabody Picture Vocabulary…	2
Classroom Environment Scale	1
Law School Admission Test	1
My Class Inventory	1
Purdue Teacher Opinionaire	1
Remote Associates Test	1
Self Description Questionnaire	1
Stanford Binet Intelligence…	1
State Trait Anxiety Inventory	1
System of Multicultural…	1
Test of Standard Written…	1
More ▼

What Works Clearinghouse Rating

Showing 1 to 15 of 201 results Save | Export

Controlling the Speededness of Assembled Test Forms: A Generalization to the Three-Parameter Lognormal Response Time Model

Peer reviewed

Direct link

Becker, Benjamin; Weirich, Sebastian; Goldhammer, Frank; Debeer, Dries – Journal of Educational Measurement, 2023

When designing or modifying a test, an important challenge is controlling its speededness. To achieve this, van der Linden (2011a, 2011b) proposed using a lognormal response time model, more specifically the two-parameter lognormal model, and automated test assembly (ATA) via mixed integer linear programming. However, this approach has a severe…

Descriptors: Test Construction, Automation, Models, Test Items

Optimal Calibration of Items for Multidimensional Achievement Tests

Peer reviewed

Direct link

Mahmood Ul Hassan; Frank Miller – Journal of Educational Measurement, 2024

Multidimensional achievement tests are recently gaining more importance in educational and psychological measurements. For example, multidimensional diagnostic tests can help students to determine which particular domain of knowledge they need to improve for better performance. To estimate the characteristics of candidate items (calibration) for…

Descriptors: Multidimensional Scaling, Achievement Tests, Test Items, Test Construction

A Generalized Objective Function for Computer Adaptive Item Selection

Peer reviewed

Direct link

Harold Doran; Testsuhiro Yamada; Ted Diaz; Emre Gonulates; Vanessa Culver – Journal of Educational Measurement, 2025

Computer adaptive testing (CAT) is an increasingly common mode of test administration offering improved test security, better measurement precision, and the potential for shorter testing experiences. This article presents a new item selection algorithm based on a generalized objective function to support multiple types of testing conditions and…

Descriptors: Computer Assisted Testing, Adaptive Testing, Test Items, Algorithms

Detecting Differential Item Functioning among Multiple Groups Using IRT Residual DIF Framework

Peer reviewed

Direct link

Hwanggyu Lim; Danqi Zhu; Edison M. Choe; Kyung T. Han – Journal of Educational Measurement, 2024

This study presents a generalized version of the residual differential item functioning (RDIF) detection framework in item response theory, named GRDIF, to analyze differential item functioning (DIF) in multiple groups. The GRDIF framework retains the advantages of the original RDIF framework, such as computational efficiency and ease of…

Descriptors: Item Response Theory, Test Bias, Test Reliability, Test Construction

Detecting Multidimensional DIF in Polytomous Items with IRT Methods and Estimation Approaches

Peer reviewed

Direct link

Güler Yavuz Temel – Journal of Educational Measurement, 2024

The purpose of this study was to investigate multidimensional DIF with a simple and nonsimple structure in the context of multidimensional Graded Response Model (MGRM). This study examined and compared the performance of the IRT-LR and Wald test using MML-EM and MHRM estimation approaches with different test factors and test structures in…

Descriptors: Computation, Multidimensional Scaling, Item Response Theory, Models

Linking and Comparability across Conditions of Measurement: Established Frameworks and Proposed Updates

Peer reviewed

Direct link

Moses, Tim – Journal of Educational Measurement, 2022

One result of recent changes in testing is that previously established linking frameworks may not adequately address challenges in current linking situations. Test linking through equating, concordance, vertical scaling or battery scaling may not represent linkings for the scores of tests developed to measure constructs differently for different…

Descriptors: Measures (Individuals), Educational Assessment, Test Construction, Comparative Analysis

Automated Test Assembly with Mixed-Integer Programming: The Effects of Modeling Approaches and Solvers

Peer reviewed

Direct link

Luo, Xiao – Journal of Educational Measurement, 2020

Automated test assembly (ATA) is a modern approach to test assembly that applies advanced optimization algorithms on computers to build test forms automatically. ATA greatly improves the efficiency and accuracy of the test assembly. This study investigated the effects of the modeling methods and solvers in the mixed-integer programming (MIP)…

Descriptors: Test Construction, Automation, Programming, Models

Historical Perspectives on Score Comparability Issues Raised by Innovations in Testing

Peer reviewed

Direct link

Baldwin, Peter; Clauser, Brian E. – Journal of Educational Measurement, 2022

While score comparability across test forms typically relies on common (or randomly equivalent) examinees or items, innovations in item formats, test delivery, and efforts to extend the range of score interpretation may require a special data collection before examinees or items can be used in this way--or may be incompatible with common examinee…

Descriptors: Scoring, Testing, Test Items, Test Format

Using Natural Language Processing to Predict Item Response Times and Improve Test Construction

Peer reviewed

Direct link

Baldwin, Peter; Yaneva, Victoria; Mee, Janet; Clauser, Brian E.; Ha, Le An – Journal of Educational Measurement, 2021

In this article, it is shown how item text can be represented by (a) 113 features quantifying the text's linguistic characteristics, (b) 16 measures of the extent to which an information-retrieval-based automatic question-answering system finds an item challenging, and (c) through dense word representations (word embeddings). Using a random…

Descriptors: Natural Language Processing, Prediction, Item Response Theory, Reaction Time

The Automated Test Assembly and Routing Rule for Multistage Adaptive Testing with Multidimensional Item Response Theory

Peer reviewed

Direct link

Xu, Lingling; Wang, Shiyu; Cai, Yan; Tu, Dongbo – Journal of Educational Measurement, 2021

Designing a multidimensional adaptive test (M-MST) based on a multidimensional item response theory (MIRT) model is critical to make full use of the advantages of both MST and MIRT in implementing multidimensional assessments. This study proposed two types of automated test assembly (ATA) algorithms and one set of routing rules that can facilitate…

Descriptors: Item Response Theory, Adaptive Testing, Automation, Test Construction

Using Multilabel Neural Network to Score High-Dimensional Assessments for Different Use Foci: An Example with College Major Preference Assessment

Peer reviewed

Direct link

Shun-Fu Hu; Amery D. Wu; Jake Stone – Journal of Educational Measurement, 2025

Scoring high-dimensional assessments (e.g., > 15 traits) can be a challenging task. This paper introduces the multilabel neural network (MNN) as a scoring method for high-dimensional assessments. Additionally, it demonstrates how MNN can score the same test responses to maximize different performance metrics, such as accuracy, recall, or…

Descriptors: Tests, Testing, Scores, Test Construction

Efficiency of Targeted Multistage Calibration Designs under Practical Constraints: A Simulation Study

Peer reviewed

Direct link

Berger, Stéphanie; Verschoor, Angela J.; Eggen, Theo J. H. M.; Moser, Urs – Journal of Educational Measurement, 2019

Calibration of an item bank for computer adaptive testing requires substantial resources. In this study, we investigated whether the efficiency of calibration under the Rasch model could be enhanced by improving the match between item difficulty and student ability. We introduced targeted multistage calibration designs, a design type that…

Descriptors: Simulation, Computer Assisted Testing, Test Items, Difficulty Level

Using Eye-Tracking Data as Part of the Validity Argument for Multiple-Choice Questions: A Demonstration

Peer reviewed

Direct link

Yaneva, Victoria; Clauser, Brian E.; Morales, Amy; Paniagua, Miguel – Journal of Educational Measurement, 2021

Eye-tracking technology can create a record of the location and duration of visual fixations as a test-taker reads test questions. Although the cognitive process the test-taker is using cannot be directly observed, eye-tracking data can support inferences about these unobserved cognitive processes. This type of information has the potential to…

Descriptors: Eye Movements, Test Validity, Multiple Choice Tests, Cognitive Processes

Bias and Bias Correction Method for Nonproportional Abilities Requirement (NPAR) Tests

Peer reviewed

Direct link

Ip, Edward H.; Strachan, Tyler; Fu, Yanyan; Lay, Alexandra; Willse, John T.; Chen, Shyh-Huei; Rutkowski, Leslie; Ackerman, Terry – Journal of Educational Measurement, 2019

Test items must often be broad in scope to be ecologically valid. It is therefore almost inevitable that secondary dimensions are introduced into a test during test development. A cognitive test may require one or more abilities besides the primary ability to correctly respond to an item, in which case a unidimensional test score overestimates the…

Descriptors: Test Items, Test Bias, Test Construction, Scores

Item Calibration Methods with Multiple Subscale Multistage Testing

Peer reviewed

Direct link

Chun Wang; Ping Chen; Shengyu Jiang – Journal of Educational Measurement, 2020

Many large-scale educational surveys have moved from linear form design to multistage testing (MST) design. One advantage of MST is that it can provide more accurate latent trait [theta] estimates using fewer items than required by linear tests. However, MST generates incomplete response data by design; hence, questions remain as to how to…

Descriptors: Test Construction, Test Items, Adaptive Testing, Maximum Likelihood Statistics

Previous Page | Next Page »

Pages: 1 | 2 | 3 | 4 | 5 | 6 | 7 | 8 | 9 | 10 | 11 | ... | 14

Wainer, Howard	7
van der Linden, Wim J.	5
Clauser, Brian E.	4
Hambleton, Ronald K.	4
Veldkamp, Bernard P.	4
Lewis, Charles	3
Ackerman, Terry	2
Adema, Jos J.	2
Ariel, Adelaide	2
Baker, Eva L.	2
Baldwin, Peter	2
Bennett, Randy Elliot	2
Debeer, Dries	2
Ebel, Robert L.	2
Embretson, Susan E.	2
Fitzpatrick, Anne R.	2
Huitzing, Hiddo A.	2
Jenkins, Joseph R.	2
Jirele, Thomas	2
Johnson, Eugene G.	2
Kolen, Michael J.	2
Livingston, Samuel A.	2
Lord, Frederic M.	2
Luecht, Richard M.	2
More ▼