Publication Date
In 2025 | 2 |
Since 2024 | 4 |
Since 2021 (last 5 years) | 20 |
Since 2016 (last 10 years) | 32 |
Since 2006 (last 20 years) | 55 |
Descriptor
Test Items | 56 |
Item Response Theory | 24 |
Difficulty Level | 16 |
Scores | 13 |
Test Format | 13 |
Multiple Choice Tests | 12 |
Foreign Countries | 10 |
Item Analysis | 10 |
Statistical Analysis | 10 |
Test Construction | 10 |
Test Validity | 9 |
More ▼ |
Source
Practical Assessment,… | 56 |
Author
Han, Kyung T. | 3 |
Metsämuuronen, Jari | 3 |
Baghaei, Purya | 2 |
Buckendahl, Chad W. | 2 |
Russell, Michael | 2 |
Agus Santoso | 1 |
Ahmadi, Alireza | 1 |
Anthony Sparks | 1 |
Asmundson, Gordon J. G. | 1 |
Babcock, Ben | 1 |
Bao, Han | 1 |
More ▼ |
Publication Type
Journal Articles | 56 |
Reports - Research | 38 |
Reports - Descriptive | 10 |
Reports - Evaluative | 8 |
Tests/Questionnaires | 2 |
Education Level
Higher Education | 8 |
Postsecondary Education | 8 |
Elementary Education | 6 |
Middle Schools | 6 |
Junior High Schools | 5 |
Elementary Secondary Education | 4 |
Intermediate Grades | 4 |
Secondary Education | 4 |
Grade 5 | 3 |
Grade 6 | 2 |
Grade 7 | 2 |
More ▼ |
Audience
Laws, Policies, & Programs
Assessments and Surveys
Trends in International… | 2 |
Massachusetts Comprehensive… | 1 |
United States Medical… | 1 |
What Works Clearinghouse Rating
Tom Benton – Practical Assessment, Research & Evaluation, 2025
This paper proposes an extension of linear equating that may be useful in one of two fairly common assessment scenarios. One is where different students have taken different combinations of test forms. This might occur, for example, where students have some free choice over the exam papers they take within a particular qualification. In this…
Descriptors: Equated Scores, Test Format, Test Items, Computation
Jianbin Fu; TsungHan Ho; Xuan Tan – Practical Assessment, Research & Evaluation, 2025
Item parameter estimation using an item response theory (IRT) model with fixed ability estimates is useful in equating with small samples on anchor items. The current study explores the impact of three ability estimation methods (weighted likelihood estimation [WLE], maximum a posteriori [MAP], and posterior ability distribution estimation [PST])…
Descriptors: Item Response Theory, Test Items, Computation, Equated Scores
Metsämuuronen, Jari – Practical Assessment, Research & Evaluation, 2023
Traditional estimators of reliability such as coefficients alpha, theta, omega, and rho (maximal reliability) are prone to give radical underestimates of reliability for the tests common when testing educational achievement. These tests are often structured by widely deviating item difficulties. This is a typical pattern where the traditional…
Descriptors: Test Reliability, Achievement Tests, Computation, Test Items
Deschênes, Marie-France; Dionne, Éric; Dorion, Michelle; Grondin, Julie – Practical Assessment, Research & Evaluation, 2023
The use of the aggregate scoring method for scoring concordance tests requires the weighting of test items to be derived from the performance of a group of experts who take the test under the same conditions as the examinees. However, the average score of experts constituting the reference panel remains a critical issue in the use of these tests.…
Descriptors: Scoring, Tests, Evaluation Methods, Test Items
Pentecost, Thomas C.; Raker, Jeffery R.; Murphy, Kristen L. – Practical Assessment, Research & Evaluation, 2023
Using multiple versions of an assessment has the potential to introduce item environment effects. These types of effects result in version dependent item characteristics (i.e., difficulty and discrimination). Methods to detect such effects and resulting implications are important for all levels of assessment where multiple forms of an assessment…
Descriptors: Item Response Theory, Test Items, Test Format, Science Tests
Metsämuuronen, Jari – Practical Assessment, Research & Evaluation, 2022
The reliability of a test score is usually underestimated and the deflation may be profound, 0.40 - 0.60 units of reliability or 46 - 71%. Eight root sources of the deflation are discussed and quantified by a simulation with 1,440 real-world datasets: (1) errors in the measurement modelling, (2) inefficiency in the estimator of reliability within…
Descriptors: Test Reliability, Scores, Test Items, Correlation
Svihla, Vanessa; Gallup, Amber – Practical Assessment, Research & Evaluation, 2021
In making validity arguments, a central consideration is whether the instrument fairly and adequately covers intended content, and this is often evaluated by experts. While common procedures exist for quantitatively assessing this, the effect of loss aversion--a cognitive bias that would predict a tendency to retain items--on these procedures has…
Descriptors: Content Validity, Anxiety, Bias, Test Items
Stemler, Steven E.; Naples, Adam – Practical Assessment, Research & Evaluation, 2021
When students receive the same score on a test, does that mean they know the same amount about the topic? The answer to this question is more complex than it may first appear. This paper compares classical and modern test theories in terms of how they estimate student ability. Crucial distinctions between the aims of Rasch Measurement and IRT are…
Descriptors: Item Response Theory, Test Theory, Ability, Computation
Metsämuuronen, Jari – Practical Assessment, Research & Evaluation, 2022
This article discusses visual techniques for detecting test items that would be optimal to be selected to the final compilation on the one hand and, on the other hand, to out-select those items that would lower the quality of the compilation. Some classic visual tools are discussed, first, in a practical manner in diagnosing the logical,…
Descriptors: Test Items, Item Analysis, Item Response Theory, Cutting Scores
Inga Laukaityte; Marie Wiberg – Practical Assessment, Research & Evaluation, 2024
The overall aim was to examine effects of differences in group ability and features of the anchor test form on equating bias and the standard error of equating (SEE) using both real and simulated data. Chained kernel equating, Postratification kernel equating, and Circle-arc equating were studied. A college admissions test with four different…
Descriptors: Ability Grouping, Test Items, College Entrance Examinations, High Stakes Tests
Wan, Siyu; Keller, Lisa A. – Practical Assessment, Research & Evaluation, 2023
Statistical process control (SPC) charts have been widely used in the field of educational measurement. The cumulative sum (CUSUM) is an established SPC method to detect aberrant responses for educational assessments. There are many studies that investigated the performance of CUSUM in different test settings. This paper describes the CUSUM…
Descriptors: Visual Aids, Educational Assessment, Evaluation Methods, Item Response Theory
Agus Santoso; Heri Retnawati; Timbul Pardede; Ibnu Rafi; Munaya Nikma Rosyada; Gulzhaina K. Kassymova; Xu Wenxin – Practical Assessment, Research & Evaluation, 2024
The test blueprint is important in test development, where it guides the test item writer in creating test items according to the desired objectives and specifications or characteristics (so-called a priori item characteristics), such as the level of item difficulty in the category and the distribution of items based on their difficulty level.…
Descriptors: Foreign Countries, Undergraduate Students, Business English, Test Construction
Wiberg, Marie – Practical Assessment, Research & Evaluation, 2021
The overall aim was to examine the equated values when using different linkage plans and different observed-score equipercentile equating methods with the equivalent groups (EG) design and the nonequivalent groups with anchor test (NEAT) design. Both real data from a college admissions test and simulated data were used with frequency estimation,…
Descriptors: Equated Scores, Test Items, Methods, College Entrance Examinations
Sarah Wellberg; Anthony Sparks; Leanne Ketterlin-Geller – Practical Assessment, Research & Evaluation, 2023
The early development of spatial reasoning skills has been linked to future success in mathematics (Wai, Lubinski, & Benbow, 2009), but research to date has mainly focused on the development of these skills within classroom settings rather than at home. The home environment is often the first place students are exposed to, and develop, early…
Descriptors: Test Construction, Test Validity, Measures (Individuals), Surveys
An Intersectional Approach to Differential Item Functioning: Reflecting Configurations of Inequality
Russell, Michael; Kaplan, Larry – Practical Assessment, Research & Evaluation, 2021
Differential Item Functioning (DIF) is commonly employed to examine measurement bias of test scores. Current approaches to DIF compare item functioning separately for select demographic identities such as gender, racial stratification, and economic status. Examining potential item bias fails to recognize and capture the intersecting configurations…
Descriptors: Test Bias, Test Items, Demography, Identification