Publication Date
In 2025 | 2 |
Since 2024 | 5 |
Since 2021 (last 5 years) | 27 |
Since 2016 (last 10 years) | 59 |
Since 2006 (last 20 years) | 87 |
Descriptor
Source
Educational Measurement:… | 126 |
Author
Gierl, Mark J. | 4 |
Sinharay, Sandip | 4 |
Sireci, Stephen G. | 4 |
Feinberg, Richard A. | 3 |
Wainer, Howard | 3 |
Anderson, Daniel | 2 |
Clauser, Brian E. | 2 |
Dorans, Neil J. | 2 |
Frey, Andreas | 2 |
Hambleton, Ronald K. | 2 |
Hills, John R. | 2 |
More ▼ |
Publication Type
Education Level
Secondary Education | 9 |
Elementary Secondary Education | 8 |
Higher Education | 5 |
Postsecondary Education | 5 |
Grade 5 | 3 |
Elementary Education | 2 |
Grade 3 | 2 |
Grade 4 | 2 |
Grade 10 | 1 |
High Schools | 1 |
Audience
Laws, Policies, & Programs
Every Student Succeeds Act… | 1 |
No Child Left Behind Act 2001 | 1 |
Assessments and Surveys
What Works Clearinghouse Rating
Yanyan Fu – Educational Measurement: Issues and Practice, 2024
The template-based automated item-generation (TAIG) approach that involves template creation, item generation, item selection, field-testing, and evaluation has more steps than the traditional item development method. Consequentially, there is more margin for error in this process, and any template errors can be cascaded to the generated items.…
Descriptors: Error Correction, Automation, Test Items, Test Construction
Ye Ma; Deborah J. Harris – Educational Measurement: Issues and Practice, 2025
Item position effect (IPE) refers to situations where an item performs differently when it is administered in different positions on a test. The majority of previous research studies have focused on investigating IPE under linear testing. There is a lack of IPE research under adaptive testing. In addition, the existence of IPE might violate Item…
Descriptors: Computer Assisted Testing, Adaptive Testing, Item Response Theory, Test Items
Belzak, William C. M. – Educational Measurement: Issues and Practice, 2023
Test developers and psychometricians have historically examined measurement bias and differential item functioning (DIF) across a single categorical variable (e.g., gender), independently of other variables (e.g., race, age, etc.). This is problematic when more complex forms of measurement bias may adversely affect test responses and, ultimately,…
Descriptors: Test Bias, High Stakes Tests, Artificial Intelligence, Test Items
Zopluoglu, Cengiz; Kasli, Murat; Toton, Sarah L. – Educational Measurement: Issues and Practice, 2021
Response time information has recently attracted significant attention in the literature as it may provide meaningful information about item preknowledge. The methods that use response time information to identify examinees with potential item preknowledge make an implicit assumption that the examinees with item preknowledge differ in their…
Descriptors: Reaction Time, Cheating, Test Items
Xiangyi Liao; Daniel M Bolt – Educational Measurement: Issues and Practice, 2024
Traditional approaches to the modeling of multiple-choice item response data (e.g., 3PL, 4PL models) emphasize slips and guesses as random events. In this paper, an item response model is presented that characterizes both disjunctively interacting guessing and conjunctively interacting slipping processes as proficiency-related phenomena. We show…
Descriptors: Item Response Theory, Test Items, Error Correction, Guessing (Tests)
Fu, Yanyan; Choe, Edison M.; Lim, Hwanggyu; Choi, Jaehwa – Educational Measurement: Issues and Practice, 2022
This case study applied the "weak theory" of Automatic Item Generation (AIG) to generate isomorphic item instances (i.e., unique but psychometrically equivalent items) for a large-scale assessment. Three representative instances were selected from each item template (i.e., model) and pilot-tested. In addition, a new analytical framework,…
Descriptors: Test Items, Measurement, Psychometrics, Test Construction
Berenbon, Rebecca F.; McHugh, Bridget C. – Educational Measurement: Issues and Practice, 2023
To assemble a high-quality test, psychometricians rely on subject matter experts (SMEs) to write high-quality items. However, SMEs are not typically given the opportunity to provide input on which content standards are most suitable for multiple-choice questions (MCQs). In the present study, we explored the relationship between perceived MCQ…
Descriptors: Test Items, Multiple Choice Tests, Standards, Difficulty Level
Pan, Yiqin; Livne, Oren; Wollack, James A.; Sinharay, Sandip – Educational Measurement: Issues and Practice, 2023
In computerized adaptive testing, overexposure of items in the bank is a serious problem and might result in item compromise. We develop an item selection algorithm that utilizes the entire bank well and reduces the overexposure of items. The algorithm is based on collaborative filtering and selects an item in two stages. In the first stage, a set…
Descriptors: Computer Assisted Testing, Adaptive Testing, Test Items, Algorithms
Almehrizi, Rashid S. – Educational Measurement: Issues and Practice, 2022
Coefficient alpha reliability persists as the most common reliability coefficient reported in research. The assumptions for its use are, however, not well-understood. The current paper challenges the commonly used expressions of coefficient alpha and argues that while these expressions are correct when estimating reliability for summed scores,…
Descriptors: Reliability, Scores, Scaling, Statistical Analysis
Zhang, Susu; Li, Anqi; Wang, Shiyu – Educational Measurement: Issues and Practice, 2023
In computer-based tests allowing revision and reviews, examinees' sequence of visits and answer changes to questions can be recorded. The variable-length revision log data introduce new complexities to the collected data but, at the same time, provide additional information on examinees' test-taking behavior, which can inform test development and…
Descriptors: Computer Assisted Testing, Test Construction, Test Wiseness, Test Items
Hwanggyu Lim; Kyung T. Han – Educational Measurement: Issues and Practice, 2024
Computerized adaptive testing (CAT) has gained deserved popularity in the administration of educational and professional assessments, but continues to face test security challenges. To ensure sustained quality assurance and testing integrity, it is imperative to establish and maintain multiple stable item pools that are consistent in terms of…
Descriptors: Computer Assisted Testing, Adaptive Testing, Test Items, Item Banks
Pan, Yiqin; Wollack, James A. – Educational Measurement: Issues and Practice, 2023
Pan and Wollack (PW) proposed a machine learning method to detect compromised items. We extend the work of PW to an approach detecting compromised items and examinees with item preknowledge simultaneously and draw on ideas in ensemble learning to relax several limitations in the work of PW. The suggested approach also provides a confidence score,…
Descriptors: Artificial Intelligence, Prior Learning, Item Analysis, Test Content
Sinharay, Sandip – Educational Measurement: Issues and Practice, 2022
Administrative problems such as computer malfunction and power outage occasionally lead to missing item scores, and hence to incomplete data, on credentialing tests such as the United States Medical Licensing examination. Feinberg compared four approaches for reporting pass-fail decisions to the examinees with incomplete data on credentialing…
Descriptors: Testing Problems, High Stakes Tests, Credentials, Test Items
Li, Dongmei; Kapoor, Shalini – Educational Measurement: Issues and Practice, 2022
Population invariance is a desirable property of test equating which might not hold when significant changes occur in the test population, such as those brought about by the COVID-19 pandemic. This research aims to investigate whether equating functions are reasonably invariant when the test population is impacted by the pandemic. Based on…
Descriptors: Test Items, Equated Scores, COVID-19, Pandemics
Kim, Sooyeon; Walker, Michael E. – Educational Measurement: Issues and Practice, 2022
Test equating requires collecting data to link the scores from different forms of a test. Problems arise when equating samples are not equivalent and the test forms to be linked share no common items by which to measure or adjust for the group nonequivalence. Using data from five operational test forms, we created five pairs of research forms for…
Descriptors: Ability, Tests, Equated Scores, Testing Problems