Publication Date
In 2025 | 0 |
Since 2024 | 1 |
Since 2021 (last 5 years) | 8 |
Since 2016 (last 10 years) | 26 |
Since 2006 (last 20 years) | 58 |
Descriptor
Difficulty Level | 58 |
Error of Measurement | 58 |
Test Items | 46 |
Item Response Theory | 36 |
Comparative Analysis | 14 |
Computation | 12 |
Statistical Analysis | 12 |
Equated Scores | 11 |
Foreign Countries | 11 |
Sample Size | 10 |
Simulation | 10 |
More ▼ |
Source
Author
Alonzo, Julie | 6 |
Tindal, Gerald | 6 |
Finch, Holmes | 3 |
Paek, Insu | 3 |
Schoen, Robert C. | 3 |
Sinharay, Sandip | 3 |
Yang, Xiaotong | 3 |
Curley, Edward | 2 |
Feigenbaum, Miriam | 2 |
Holland, Paul | 2 |
Liu, Jinghua | 2 |
More ▼ |
Publication Type
Reports - Research | 45 |
Journal Articles | 44 |
Reports - Evaluative | 9 |
Numerical/Quantitative Data | 7 |
Reports - Descriptive | 3 |
Tests/Questionnaires | 2 |
Dissertations/Theses -… | 1 |
Speeches/Meeting Papers | 1 |
Education Level
Audience
Location
Austria | 1 |
Belgium | 1 |
Canada | 1 |
Chile | 1 |
Cyprus | 1 |
Florida | 1 |
Germany | 1 |
Indonesia | 1 |
Japan | 1 |
Luxembourg | 1 |
New Jersey | 1 |
More ▼ |
Laws, Policies, & Programs
Assessments and Surveys
SAT (College Admission Test) | 2 |
Cognitive Assessment System | 1 |
National Assessment of… | 1 |
Program for International… | 1 |
Progress in International… | 1 |
Trends in International… | 1 |
Wechsler Intelligence Scale… | 1 |
What Works Clearinghouse Rating
Sample Size and Item Parameter Estimation Precision When Utilizing the Masters' Partial Credit Model
Custer, Michael; Kim, Jongpil – Online Submission, 2023
This study utilizes an analysis of diminishing returns to examine the relationship between sample size and item parameter estimation precision when utilizing the Masters' Partial Credit Model for polytomous items. Item data from the standardization of the Batelle Developmental Inventory, 3rd Edition were used. Each item was scored with a…
Descriptors: Sample Size, Item Response Theory, Test Items, Computation
Inga Laukaityte; Marie Wiberg – Practical Assessment, Research & Evaluation, 2024
The overall aim was to examine effects of differences in group ability and features of the anchor test form on equating bias and the standard error of equating (SEE) using both real and simulated data. Chained kernel equating, Postratification kernel equating, and Circle-arc equating were studied. A college admissions test with four different…
Descriptors: Ability Grouping, Test Items, College Entrance Examinations, High Stakes Tests
Akin-Arikan, Çigdem; Gelbal, Selahattin – Eurasian Journal of Educational Research, 2021
Purpose: This study aims to compare the performances of Item Response Theory (IRT) equating and kernel equating (KE) methods based on equating errors (RMSD) and standard error of equating (SEE) using the anchor item nonequivalent groups design. Method: Within this scope, a set of conditions, including ability distribution, type of anchor items…
Descriptors: Equated Scores, Item Response Theory, Test Items, Statistical Analysis
Al-zboon, Habis Saad; Alrekebat, Amjad Farhan – International Journal of Higher Education, 2021
This study aims at identifying the effect of multiple-choice test items' difficulty degree on the reliability coefficient and the standard error of measurement depending on the item response theory IRT. To achieve the objectives of the study, (WinGen3) software was used to generate the IRT parameters (difficulty, discrimination, guessing) for four…
Descriptors: Multiple Choice Tests, Test Items, Difficulty Level, Error of Measurement
Wyse, Adam E.; Babcock, Ben – Educational Measurement: Issues and Practice, 2020
A common belief is that the Bookmark method is a cognitively simpler standard-setting method than the modified Angoff method. However, a limited amount of research has investigated panelist's ability to perform well the Bookmark method, and whether some of the challenges panelists face with the Angoff method may also be present in the Bookmark…
Descriptors: Standard Setting (Scoring), Evaluation Methods, Testing Problems, Test Items
Koçak, Duygu – International Journal of Progressive Education, 2020
The aim of this study was to determine the effect of chance success on test equalization. For this purpose, artificially generated 500 and 1000 sample size data sets were synchronized using linear equalization and equal percentage equalization methods. In the data which were produced as a simulative, a total of four cases were created with no…
Descriptors: Test Theory, Equated Scores, Error of Measurement, Sample Size
Lu, Ru; Guo, Hongwen; Dorans, Neil J. – ETS Research Report Series, 2021
Two families of analysis methods can be used for differential item functioning (DIF) analysis. One family is DIF analysis based on observed scores, such as the Mantel-Haenszel (MH) and the standardized proportion-correct metric for DIF procedures; the other is analysis based on latent ability, in which the statistic is a measure of departure from…
Descriptors: Robustness (Statistics), Weighted Scores, Test Items, Item Analysis
Joo, Seang-Hwane; Ferron, John M.; Moeyaert, Mariola; Beretvas, S. Natasha; Van den Noortgate, Wim – Journal of Experimental Education, 2019
Multilevel modeling has been utilized for combining single-case experimental design (SCED) data assuming simple level-1 error structures. The purpose of this study is to compare various multilevel analysis approaches for handling potential complexity in the level-1 error structure within SCED data, including approaches assuming simple and complex…
Descriptors: Hierarchical Linear Modeling, Synthesis, Data Analysis, Accuracy
Clauser, Brian E.; Kane, Michael; Clauser, Jerome C. – Journal of Educational Measurement, 2020
An Angoff standard setting study generally yields judgments on a number of items by a number of judges (who may or may not be nested in panels). Variability associated with judges (and possibly panels) contributes error to the resulting cut score. The variability associated with items plays a more complicated role. To the extent that the mean item…
Descriptors: Cutting Scores, Generalization, Decision Making, Standard Setting
Sahin, Melek Gulsah – International Journal of Assessment Tools in Education, 2020
Computer Adaptive Multistage Testing (ca-MST), which take the advantage of computer technology and adaptive test form, are widely used, and are now a popular issue of assessment and evaluation. This study aims at analyzing the effect of different panel designs, module lengths, and different sequence of a parameter value across stages and change in…
Descriptors: Computer Assisted Testing, Adaptive Testing, Test Items, Item Response Theory
Kara, Hakan; Cetin, Sevda – International Journal of Assessment Tools in Education, 2020
In this study, the efficiency of various random sampling methods to reduce the number of items rated by judges in an Angoff standard-setting study was examined and the methods were compared with each other. Firstly, the full-length test was formed by combining Placement Test 2012 and 2013 mathematics subsets. After then, simple random sampling…
Descriptors: Cutting Scores, Standard Setting (Scoring), Sampling, Error of Measurement
Bramley, Tom – Research Matters, 2020
The aim of this study was to compare, by simulation, the accuracy of mapping a cut-score from one test to another by expert judgement (using the Angoff method) versus the accuracy with a small-sample equating method (chained linear equating). As expected, the standard-setting method resulted in more accurate equating when we assumed a higher level…
Descriptors: Cutting Scores, Standard Setting (Scoring), Equated Scores, Accuracy
Park, Sung Eun; Ahn, Soyeon; Zopluoglu, Cengiz – Educational and Psychological Measurement, 2021
This study presents a new approach to synthesizing differential item functioning (DIF) effect size: First, using correlation matrices from each study, we perform a multigroup confirmatory factor analysis (MGCFA) that examines measurement invariance of a test item between two subgroups (i.e., focal and reference groups). Then we synthesize, across…
Descriptors: Item Analysis, Effect Size, Difficulty Level, Monte Carlo Methods
Lions, Séverin; Dartnell, Pablo; Toledo, Gabriela; Godoy, María Inés; Córdova, Nora; Jiménez, Daniela; Lemarié, Julie – Educational and Psychological Measurement, 2023
Even though the impact of the position of response options on answers to multiple-choice items has been investigated for decades, it remains debated. Research on this topic is inconclusive, perhaps because too few studies have obtained experimental data from large-sized samples in a real-world context and have manipulated the position of both…
Descriptors: Multiple Choice Tests, Test Items, Item Analysis, Responses
Abulela, Mohammed A. A.; Rios, Joseph A. – Applied Measurement in Education, 2022
When there are no personal consequences associated with test performance for examinees, rapid guessing (RG) is a concern and can differ between subgroups. To date, the impact of differential RG on item-level measurement invariance has received minimal attention. To that end, a simulation study was conducted to examine the robustness of the…
Descriptors: Comparative Analysis, Robustness (Statistics), Nonparametric Statistics, Item Analysis