Publication Date
In 2025 | 1 |
Since 2024 | 2 |
Since 2021 (last 5 years) | 5 |
Since 2016 (last 10 years) | 15 |
Since 2006 (last 20 years) | 53 |
Descriptor
Comparative Analysis | 73 |
Evaluation Methods | 73 |
Test Items | 73 |
Item Response Theory | 26 |
Simulation | 17 |
Test Bias | 17 |
Item Analysis | 16 |
Foreign Countries | 15 |
Scores | 14 |
Test Construction | 14 |
Achievement Tests | 11 |
More ▼ |
Source
Author
Nandakumar, Ratna | 3 |
Wu, Margaret | 3 |
Finch, Holmes | 2 |
He, Yong | 2 |
Kim, Do-Hong | 2 |
Stark, Stephen | 2 |
Wyse, Adam E. | 2 |
Abedi, Jamal | 1 |
Akbari, Alireza | 1 |
Anthony Petrosino | 1 |
Arora, Alka, Ed. | 1 |
More ▼ |
Publication Type
Education Level
Elementary Secondary Education | 9 |
Higher Education | 8 |
Grade 4 | 4 |
Grade 8 | 4 |
Postsecondary Education | 4 |
Elementary Education | 3 |
Secondary Education | 3 |
Early Childhood Education | 2 |
Grade 5 | 2 |
Grade 6 | 2 |
Intermediate Grades | 2 |
More ▼ |
Audience
Practitioners | 1 |
Researchers | 1 |
Location
Canada | 3 |
United States | 2 |
Asia | 1 |
Australia | 1 |
Germany | 1 |
Hong Kong | 1 |
India | 1 |
Ohio | 1 |
Portugal | 1 |
Slovakia | 1 |
Texas | 1 |
More ▼ |
Laws, Policies, & Programs
No Child Left Behind Act 2001 | 1 |
Assessments and Surveys
What Works Clearinghouse Rating
Sohee Kim; Ki Lynn Cole – International Journal of Testing, 2025
This study conducted a comprehensive comparison of Item Response Theory (IRT) linking methods applied to a bifactor model, examining their performance on both multiple choice (MC) and mixed format tests within the common item nonequivalent group design framework. Four distinct multidimensional IRT linking approaches were explored, consisting of…
Descriptors: Item Response Theory, Comparative Analysis, Models, Item Analysis
Joakim Wallmark; James O. Ramsay; Juan Li; Marie Wiberg – Journal of Educational and Behavioral Statistics, 2024
Item response theory (IRT) models the relationship between the possible scores on a test item against a test taker's attainment of the latent trait that the item is intended to measure. In this study, we compare two models for tests with polytomously scored items: the optimal scoring (OS) model, a nonparametric IRT model based on the principles of…
Descriptors: Item Response Theory, Test Items, Models, Scoring
Gill, Tim – Research Matters, 2022
In Comparative Judgement (CJ) exercises, examiners are asked to look at a selection of candidate scripts (with marks removed) and order them in terms of which they believe display the best quality. By including scripts from different examination sessions, the results of these exercises can be used to help with maintaining standards. Results from…
Descriptors: Comparative Analysis, Decision Making, Scripts, Standards
Walter M. Stroup; Anthony Petrosino; Corey Brady; Karen Duseau – North American Chapter of the International Group for the Psychology of Mathematics Education, 2023
Tests of statistical significance often play a decisive role in establishing the empirical warrant of evidence-based research in education. The results from pattern-based assessment items, as introduced in this paper, are categorical and multimodal and do not immediately support the use of measures of central tendency as typically related to…
Descriptors: Statistical Significance, Comparative Analysis, Research Methodology, Evaluation Methods
Russell, Michael; Szendey, Olivia; Li, Zhushan – Educational Assessment, 2022
Recent research provides evidence that an intersectional approach to defining reference and focal groups results in a higher percentage of comparisons flagged for potential DIF. The study presented here examined the generalizability of this pattern across methods for examining DIF. While the level of DIF detection differed among the four methods…
Descriptors: Comparative Analysis, Item Analysis, Test Items, Test Construction
Yesiltas, Gonca; Paek, Insu – Educational and Psychological Measurement, 2020
A log-linear model (LLM) is a well-known statistical method to examine the relationship among categorical variables. This study investigated the performance of LLM in detecting differential item functioning (DIF) for polytomously scored items via simulations where various sample sizes, ability mean differences (impact), and DIF types were…
Descriptors: Simulation, Sample Size, Item Analysis, Scores
Akbari, Alireza; Shahnazari, Mohammadtaghi – Language Testing in Asia, 2019
The present research paper introduces a translation evaluation method called Calibrated Parsing Items Evaluation (CPIE hereafter). This evaluation method maximizes translators' performance through identifying the parsing items with an optimal p-docimology and d-index (item discrimination). This method checks all the possible parses (annotations)…
Descriptors: Test Items, Translation, Computer Software, Evaluators
Malec, Wojciech; Krzeminska-Adamek, Malgorzata – Practical Assessment, Research & Evaluation, 2020
The main objective of the article is to compare several methods of evaluating multiple-choice options through classical item analysis. The methods subjected to examination include the tabulation of choice distribution, the interpretation of trace lines, the point-biserial correlation, the categorical analysis of trace lines, and the investigation…
Descriptors: Comparative Analysis, Evaluation Methods, Multiple Choice Tests, Item Analysis
Wyse, Adam E. – Educational Measurement: Issues and Practice, 2017
This article illustrates five different methods for estimating Angoff cut scores using item response theory (IRT) models. These include maximum likelihood (ML), expected a priori (EAP), modal a priori (MAP), and weighted maximum likelihood (WML) estimators, as well as the most commonly used approach based on translating ratings through the test…
Descriptors: Cutting Scores, Item Response Theory, Bayesian Statistics, Maximum Likelihood Statistics
Wolkowitz, Amanda A.; Davis-Becker, Susan L.; Gerrow, Jack D. – Journal of Applied Testing Technology, 2016
The purpose of this study was to investigate the impact of a cheating prevention strategy employed for a professional credentialing exam that involved releasing over 7,000 active and retired exam items. This study evaluated: 1) If any significant differences existed between examinee performance on released versus non-released items; 2) If item…
Descriptors: Cheating, Test Content, Test Items, Foreign Countries
Moothedath, Shana; Chaporkar, Prasanna; Belur, Madhu N. – Perspectives in Education, 2016
In recent years, the computerised adaptive test (CAT) has gained popularity over conventional exams in evaluating student capabilities with desired accuracy. However, the key limitation of CAT is that it requires a large pool of pre-calibrated questions. In the absence of such a pre-calibrated question bank, offline exams with uncalibrated…
Descriptors: Guessing (Tests), Computer Assisted Testing, Adaptive Testing, Maximum Likelihood Statistics
Dadey, Nathan; Lyons, Susan; DePascale, Charles – Applied Measurement in Education, 2018
Evidence of comparability is generally needed whenever there are variations in the conditions of an assessment administration, including variations introduced by the administration of an assessment on multiple digital devices (e.g., tablet, laptop, desktop). This article is meant to provide a comprehensive examination of issues relevant to the…
Descriptors: Evaluation Methods, Computer Assisted Testing, Educational Technology, Technology Uses in Education
Wolf, Raffaela – ProQuest LLC, 2013
Preservation of equity properties was examined using four equating methods--IRT True Score, IRT Observed Score, Frequency Estimation, and Chained Equipercentile--in a mixed-format test under a common-item nonequivalent groups (CINEG) design. Equating of mixed-format tests under a CINEG design can be influenced by factors such as attributes of the…
Descriptors: Testing, Item Response Theory, Equated Scores, Test Items
Hou, Likun; de la Torre, Jimmy; Nandakumar, Ratna – Journal of Educational Measurement, 2014
Analyzing examinees' responses using cognitive diagnostic models (CDMs) has the advantage of providing diagnostic information. To ensure the validity of the results from these models, differential item functioning (DIF) in CDMs needs to be investigated. In this article, the Wald test is proposed to examine DIF in the context of CDMs. This study…
Descriptors: Test Bias, Models, Simulation, Error Patterns
Liu, Yan; Zumbo, Bruno D.; Gustafson, Paul; Huang, Yi; Kroc, Edward; Wu, Amery D. – Practical Assessment, Research & Evaluation, 2016
A variety of differential item functioning (DIF) methods have been proposed and used for ensuring that a test is fair to all test takers in a target population in the situations of, for example, a test being translated to other languages. However, once a method flags an item as DIF, it is difficult to conclude that the grouping variable (e.g.,…
Descriptors: Test Items, Test Bias, Probability, Scores