Publication Date
In 2025 | 5 |
Since 2024 | 19 |
Since 2021 (last 5 years) | 63 |
Since 2016 (last 10 years) | 158 |
Since 2006 (last 20 years) | 422 |
Descriptor
Source
Author
van der Linden, Wim J. | 13 |
Stansfield, Charles W. | 7 |
Tindal, Gerald | 5 |
Gierl, Mark J. | 4 |
Raykov, Tenko | 4 |
Sinharay, Sandip | 4 |
Veldkamp, Bernard P. | 4 |
Wainer, Howard | 4 |
Abedi, Jamal | 3 |
Alonzo, Julie | 3 |
Camilli, Gregory | 3 |
More ▼ |
Publication Type
Education Level
Audience
Practitioners | 59 |
Teachers | 55 |
Administrators | 23 |
Researchers | 20 |
Policymakers | 6 |
Community | 3 |
Students | 3 |
Counselors | 2 |
Parents | 2 |
Location
Australia | 19 |
Canada | 11 |
Florida | 6 |
Hong Kong | 5 |
Massachusetts | 5 |
United Kingdom (Great Britain) | 5 |
China | 4 |
Oregon | 4 |
United Kingdom | 4 |
Asia | 3 |
India | 3 |
More ▼ |
Laws, Policies, & Programs
Assessments and Surveys
What Works Clearinghouse Rating
Yanyan Fu – Educational Measurement: Issues and Practice, 2024
The template-based automated item-generation (TAIG) approach that involves template creation, item generation, item selection, field-testing, and evaluation has more steps than the traditional item development method. Consequentially, there is more margin for error in this process, and any template errors can be cascaded to the generated items.…
Descriptors: Error Correction, Automation, Test Items, Test Construction
Anne Traynor; Sara C. Christopherson – Applied Measurement in Education, 2024
Combining methods from earlier content validity and more contemporary content alignment studies may allow a more complete evaluation of the meaning of test scores than if either set of methods is used on its own. This article distinguishes item relevance indices in the content validity literature from test representativeness indices in the…
Descriptors: Test Validity, Test Items, Achievement Tests, Test Construction
Jianbin Fu; Xuan Tan; Patrick C. Kyllonen – Journal of Educational Measurement, 2024
This paper presents the item and test information functions of the Rank two-parameter logistic models (Rank-2PLM) for items with two (pair) and three (triplet) statements in forced-choice questionnaires. The Rank-2PLM model for pairs is the MUPP-2PLM (Multi-Unidimensional Pairwise Preference) and, for triplets, is the Triplet-2PLM. Fisher's…
Descriptors: Questionnaires, Test Items, Item Response Theory, Models
Belzak, William C. M. – Educational Measurement: Issues and Practice, 2023
Test developers and psychometricians have historically examined measurement bias and differential item functioning (DIF) across a single categorical variable (e.g., gender), independently of other variables (e.g., race, age, etc.). This is problematic when more complex forms of measurement bias may adversely affect test responses and, ultimately,…
Descriptors: Test Bias, High Stakes Tests, Artificial Intelligence, Test Items
Huang, Sijia; Luo, Jinwen; Cai, Li – Educational and Psychological Measurement, 2023
Random item effects item response theory (IRT) models, which treat both person and item effects as random, have received much attention for more than a decade. The random item effects approach has several advantages in many practical settings. The present study introduced an explanatory multidimensional random item effects rating scale model. The…
Descriptors: Rating Scales, Item Response Theory, Models, Test Items
Harold Doran; Testsuhiro Yamada; Ted Diaz; Emre Gonulates; Vanessa Culver – Journal of Educational Measurement, 2025
Computer adaptive testing (CAT) is an increasingly common mode of test administration offering improved test security, better measurement precision, and the potential for shorter testing experiences. This article presents a new item selection algorithm based on a generalized objective function to support multiple types of testing conditions and…
Descriptors: Computer Assisted Testing, Adaptive Testing, Test Items, Algorithms
Gyamfi, Abraham; Acquaye, Rosemary – Acta Educationis Generalis, 2023
Introduction: Item response theory (IRT) has received much attention in validation of assessment instrument because it allows the estimation of students' ability from any set of the items. Item response theory allows the difficulty and discrimination levels of each item on the test to be estimated. In the framework of IRT, item characteristics are…
Descriptors: Item Response Theory, Models, Test Items, Difficulty Level
Jonathan Seiden – Annenberg Institute for School Reform at Brown University, 2025
Direct assessments of early childhood development (ECD) are a cornerstone of research in developmental psychology and are increasingly used to evaluate programs and policies in lower- and middle-income countries. Despite strong psychometric properties, these assessments are too expensive and time consuming for use in large-scale monitoring or…
Descriptors: Young Children, Child Development, Performance Based Assessment, Developmental Psychology
Bolt, Daniel M.; Liao, Xiangyi – Journal of Educational Measurement, 2021
We revisit the empirically observed positive correlation between DIF and difficulty studied by Freedle and commonly seen in tests of verbal proficiency when comparing populations of different mean latent proficiency levels. It is shown that a positive correlation between DIF and difficulty estimates is actually an expected result (absent any true…
Descriptors: Test Bias, Difficulty Level, Correlation, Verbal Tests
Meike Akveld; George Kinnear – International Journal of Mathematical Education in Science and Technology, 2024
Many universities use diagnostic tests to assess incoming students' preparedness for mathematics courses. Diagnostic test results can help students to identify topics where they need more practice and give lecturers a summary of strengths and weaknesses in their class. We demonstrate a process that can be used to make improvements to a mathematics…
Descriptors: Mathematics Tests, Diagnostic Tests, Test Items, Item Analysis
Metsämuuronen, Jari – Practical Assessment, Research & Evaluation, 2022
This article discusses visual techniques for detecting test items that would be optimal to be selected to the final compilation on the one hand and, on the other hand, to out-select those items that would lower the quality of the compilation. Some classic visual tools are discussed, first, in a practical manner in diagnosing the logical,…
Descriptors: Test Items, Item Analysis, Item Response Theory, Cutting Scores
Maristela Petrovic-Dzerdz – Collected Essays on Learning and Teaching, 2024
Large introductory classes, with their expansive curriculum, demand assessment strategies that blend efficiency with reliability, prompting the consideration of multiple-choice (MC) tests as a viable option. Crafting a high-quality MC test, however, necessitates a meticulous process involving reflection on assessment format appropriateness, test…
Descriptors: Multiple Choice Tests, Test Construction, Test Items, Alignment (Education)
Deng, Jacky M.; Streja, Nicholas; Flynn, Alison B. – Journal of Chemical Education, 2021
Response process validity evidence can provide researchers with insight into how and why participants interpret items on instruments such as tests and questionnaires. In chemistry education research literature and the social sciences more broadly, response process validity evidence has been used and reported in a variety of ways. This paper's…
Descriptors: Chemistry, Science Education, Educational Research, Validity
Jessica M. Kramer; Evan E. Dean; Micah Peace Urquilla; Joan B. Beasley; Brad Linnenkamp – Inclusion, 2024
Researchers have implemented inclusive research for over 30 years. This article describes how two research projects collaborated with researchers with disabilities and aligns the description with four attributes of inclusive research developed by a consensus of international experts with and without disabilities. The first project, the Person…
Descriptors: Researchers, Cooperation, Intellectual Disability, Developmental Disabilities
Mark Wilson – Journal of Educational and Behavioral Statistics, 2024
This article introduces a new framework for articulating how educational assessments can be related to teacher uses in the classroom. It articulates three levels of assessment: macro (use of standardized tests), meso (externally developed items), and micro (on-the-fly in the classroom). The first level is the usual context for educational…
Descriptors: Educational Assessment, Measurement, Standardized Tests, Test Items