NotesFAQContact Us
Collection
Advanced
Search Tips
Source
Practical Assessment,…56
Audience
Laws, Policies, & Programs
What Works Clearinghouse Rating
Showing 1 to 15 of 56 results Save | Export
Peer reviewed Peer reviewed
PDF on ERIC Download full text
Tom Benton – Practical Assessment, Research & Evaluation, 2025
This paper proposes an extension of linear equating that may be useful in one of two fairly common assessment scenarios. One is where different students have taken different combinations of test forms. This might occur, for example, where students have some free choice over the exam papers they take within a particular qualification. In this…
Descriptors: Equated Scores, Test Format, Test Items, Computation
Peer reviewed Peer reviewed
PDF on ERIC Download full text
Jianbin Fu; TsungHan Ho; Xuan Tan – Practical Assessment, Research & Evaluation, 2025
Item parameter estimation using an item response theory (IRT) model with fixed ability estimates is useful in equating with small samples on anchor items. The current study explores the impact of three ability estimation methods (weighted likelihood estimation [WLE], maximum a posteriori [MAP], and posterior ability distribution estimation [PST])…
Descriptors: Item Response Theory, Test Items, Computation, Equated Scores
Peer reviewed Peer reviewed
PDF on ERIC Download full text
Metsämuuronen, Jari – Practical Assessment, Research & Evaluation, 2023
Traditional estimators of reliability such as coefficients alpha, theta, omega, and rho (maximal reliability) are prone to give radical underestimates of reliability for the tests common when testing educational achievement. These tests are often structured by widely deviating item difficulties. This is a typical pattern where the traditional…
Descriptors: Test Reliability, Achievement Tests, Computation, Test Items
Peer reviewed Peer reviewed
PDF on ERIC Download full text
Deschênes, Marie-France; Dionne, Éric; Dorion, Michelle; Grondin, Julie – Practical Assessment, Research & Evaluation, 2023
The use of the aggregate scoring method for scoring concordance tests requires the weighting of test items to be derived from the performance of a group of experts who take the test under the same conditions as the examinees. However, the average score of experts constituting the reference panel remains a critical issue in the use of these tests.…
Descriptors: Scoring, Tests, Evaluation Methods, Test Items
Peer reviewed Peer reviewed
PDF on ERIC Download full text
Pentecost, Thomas C.; Raker, Jeffery R.; Murphy, Kristen L. – Practical Assessment, Research & Evaluation, 2023
Using multiple versions of an assessment has the potential to introduce item environment effects. These types of effects result in version dependent item characteristics (i.e., difficulty and discrimination). Methods to detect such effects and resulting implications are important for all levels of assessment where multiple forms of an assessment…
Descriptors: Item Response Theory, Test Items, Test Format, Science Tests
Peer reviewed Peer reviewed
PDF on ERIC Download full text
Metsämuuronen, Jari – Practical Assessment, Research & Evaluation, 2022
The reliability of a test score is usually underestimated and the deflation may be profound, 0.40 - 0.60 units of reliability or 46 - 71%. Eight root sources of the deflation are discussed and quantified by a simulation with 1,440 real-world datasets: (1) errors in the measurement modelling, (2) inefficiency in the estimator of reliability within…
Descriptors: Test Reliability, Scores, Test Items, Correlation
Peer reviewed Peer reviewed
PDF on ERIC Download full text
Svihla, Vanessa; Gallup, Amber – Practical Assessment, Research & Evaluation, 2021
In making validity arguments, a central consideration is whether the instrument fairly and adequately covers intended content, and this is often evaluated by experts. While common procedures exist for quantitatively assessing this, the effect of loss aversion--a cognitive bias that would predict a tendency to retain items--on these procedures has…
Descriptors: Content Validity, Anxiety, Bias, Test Items
Peer reviewed Peer reviewed
PDF on ERIC Download full text
Stemler, Steven E.; Naples, Adam – Practical Assessment, Research & Evaluation, 2021
When students receive the same score on a test, does that mean they know the same amount about the topic? The answer to this question is more complex than it may first appear. This paper compares classical and modern test theories in terms of how they estimate student ability. Crucial distinctions between the aims of Rasch Measurement and IRT are…
Descriptors: Item Response Theory, Test Theory, Ability, Computation
Peer reviewed Peer reviewed
PDF on ERIC Download full text
Metsämuuronen, Jari – Practical Assessment, Research & Evaluation, 2022
This article discusses visual techniques for detecting test items that would be optimal to be selected to the final compilation on the one hand and, on the other hand, to out-select those items that would lower the quality of the compilation. Some classic visual tools are discussed, first, in a practical manner in diagnosing the logical,…
Descriptors: Test Items, Item Analysis, Item Response Theory, Cutting Scores
Peer reviewed Peer reviewed
PDF on ERIC Download full text
Inga Laukaityte; Marie Wiberg – Practical Assessment, Research & Evaluation, 2024
The overall aim was to examine effects of differences in group ability and features of the anchor test form on equating bias and the standard error of equating (SEE) using both real and simulated data. Chained kernel equating, Postratification kernel equating, and Circle-arc equating were studied. A college admissions test with four different…
Descriptors: Ability Grouping, Test Items, College Entrance Examinations, High Stakes Tests
Peer reviewed Peer reviewed
PDF on ERIC Download full text
Wan, Siyu; Keller, Lisa A. – Practical Assessment, Research & Evaluation, 2023
Statistical process control (SPC) charts have been widely used in the field of educational measurement. The cumulative sum (CUSUM) is an established SPC method to detect aberrant responses for educational assessments. There are many studies that investigated the performance of CUSUM in different test settings. This paper describes the CUSUM…
Descriptors: Visual Aids, Educational Assessment, Evaluation Methods, Item Response Theory
Peer reviewed Peer reviewed
PDF on ERIC Download full text
Agus Santoso; Heri Retnawati; Timbul Pardede; Ibnu Rafi; Munaya Nikma Rosyada; Gulzhaina K. Kassymova; Xu Wenxin – Practical Assessment, Research & Evaluation, 2024
The test blueprint is important in test development, where it guides the test item writer in creating test items according to the desired objectives and specifications or characteristics (so-called a priori item characteristics), such as the level of item difficulty in the category and the distribution of items based on their difficulty level.…
Descriptors: Foreign Countries, Undergraduate Students, Business English, Test Construction
Peer reviewed Peer reviewed
PDF on ERIC Download full text
Wiberg, Marie – Practical Assessment, Research & Evaluation, 2021
The overall aim was to examine the equated values when using different linkage plans and different observed-score equipercentile equating methods with the equivalent groups (EG) design and the nonequivalent groups with anchor test (NEAT) design. Both real data from a college admissions test and simulated data were used with frequency estimation,…
Descriptors: Equated Scores, Test Items, Methods, College Entrance Examinations
Peer reviewed Peer reviewed
PDF on ERIC Download full text
Sarah Wellberg; Anthony Sparks; Leanne Ketterlin-Geller – Practical Assessment, Research & Evaluation, 2023
The early development of spatial reasoning skills has been linked to future success in mathematics (Wai, Lubinski, & Benbow, 2009), but research to date has mainly focused on the development of these skills within classroom settings rather than at home. The home environment is often the first place students are exposed to, and develop, early…
Descriptors: Test Construction, Test Validity, Measures (Individuals), Surveys
Peer reviewed Peer reviewed
PDF on ERIC Download full text
Russell, Michael; Kaplan, Larry – Practical Assessment, Research & Evaluation, 2021
Differential Item Functioning (DIF) is commonly employed to examine measurement bias of test scores. Current approaches to DIF compare item functioning separately for select demographic identities such as gender, racial stratification, and economic status. Examining potential item bias fails to recognize and capture the intersecting configurations…
Descriptors: Test Bias, Test Items, Demography, Identification
Previous Page | Next Page »
Pages: 1  |  2  |  3  |  4