Publication Date
In 2025 | 0 |
Since 2024 | 0 |
Since 2021 (last 5 years) | 0 |
Since 2016 (last 10 years) | 5 |
Since 2006 (last 20 years) | 8 |
Descriptor
Probability | 10 |
Simulation | 10 |
Item Response Theory | 6 |
Test Items | 4 |
Models | 3 |
Accuracy | 2 |
Achievement Tests | 2 |
Classification | 2 |
Comparative Analysis | 2 |
Computation | 2 |
Computer Assisted Testing | 2 |
More ▼ |
Source
Journal of Educational… | 10 |
Author
Andersson, Björn | 1 |
Armstrong, Ronald D. | 1 |
Babcock, Ben | 1 |
Bolsinova, Maria | 1 |
Cai, Li | 1 |
Chen, Ping | 1 |
Choi, Seung W. | 1 |
Ding, Shuliang | 1 |
Falk, Carl F. | 1 |
Halpin, Peter F. | 1 |
Hao, Jiangang | 1 |
More ▼ |
Publication Type
Journal Articles | 10 |
Reports - Research | 8 |
Reports - Descriptive | 1 |
Reports - Evaluative | 1 |
Education Level
Secondary Education | 1 |
Audience
Location
Laws, Policies, & Programs
Assessments and Surveys
Indiana Statewide Testing for… | 1 |
Program for International… | 1 |
What Works Clearinghouse Rating
Wyse, Adam E.; Babcock, Ben – Journal of Educational Measurement, 2019
One common phenomenon in Angoff standard setting is that panelists regress their ratings in toward the middle of the probability scale. This study describes two indices based on taking ratios of standard deviations that can be utilized with a scatterplot of item ratings versus expected probabilities of success to identify whether ratings are…
Descriptors: Item Analysis, Standard Setting, Probability, Feedback (Response)
Tijmstra, Jesper; Bolsinova, Maria; Liaw, Yuan-Ling; Rutkowski, Leslie; Rutkowski, David – Journal of Educational Measurement, 2020
Although the root-mean squared deviation (RMSD) is a popular statistical measure for evaluating country-specific item-level misfit (i.e., differential item functioning [DIF]) in international large-scale assessment, this paper shows that its sensitivity to detect misfit may depend strongly on the proficiency distribution of the considered…
Descriptors: Test Items, Goodness of Fit, Probability, Accuracy
Andersson, Björn – Journal of Educational Measurement, 2016
In observed-score equipercentile equating, the goal is to make scores on two scales or tests measuring the same construct comparable by matching the percentiles of the respective score distributions. If the tests consist of different items with multiple categories for each item, a suitable model for the responses is a polytomous item response…
Descriptors: Equated Scores, Item Response Theory, Error of Measurement, Tests
Wang, Wenyi; Song, Lihong; Chen, Ping; Meng, Yaru; Ding, Shuliang – Journal of Educational Measurement, 2015
Classification consistency and accuracy are viewed as important indicators for evaluating the reliability and validity of classification results in cognitive diagnostic assessment (CDA). Pattern-level classification consistency and accuracy indices were introduced by Cui, Gierl, and Chang. However, the indices at the attribute level have not yet…
Descriptors: Classification, Reliability, Accuracy, Cognitive Tests
Halpin, Peter F.; von Davier, Alina A.; Hao, Jiangang; Liu, Lei – Journal of Educational Measurement, 2017
This article addresses performance assessments that involve collaboration among students. We apply the Hawkes process to infer whether the actions of one student are associated with increased probability of further actions by his/her partner(s) in the near future. This leads to an intuitive notion of engagement among collaborators, and we consider…
Descriptors: Performance Based Assessment, Student Evaluation, Cooperative Learning, Inferences
Falk, Carl F.; Cai, Li – Journal of Educational Measurement, 2016
We present a logistic function of a monotonic polynomial with a lower asymptote, allowing additional flexibility beyond the three-parameter logistic model. We develop a maximum marginal likelihood-based approach to estimate the item parameters. The new item response model is demonstrated on math assessment data from a state, and a computationally…
Descriptors: Item Response Theory, Guessing (Tests), Mathematics Tests, Simulation
Sinharay, Sandip; Wan, Ping; Whitaker, Mike; Kim, Dong-In; Zhang, Litong; Choi, Seung W. – Journal of Educational Measurement, 2014
With an increase in the number of online tests, interruptions during testing due to unexpected technical issues seem unavoidable. For example, interruptions occurred during several recent state tests. When interruptions occur, it is important to determine the extent of their impact on the examinees' scores. There is a lack of research on this…
Descriptors: Computer Assisted Testing, Testing Problems, Scores, Regression (Statistics)
Armstrong, Ronald D.; Shi, Min – Journal of Educational Measurement, 2009
This article demonstrates the use of a new class of model-free cumulative sum (CUSUM) statistics to detect person fit given the responses to a linear test. The fundamental statistic being accumulated is the likelihood ratio of two probabilities. The detection performance of this CUSUM scheme is compared to other model-free person-fit statistics…
Descriptors: Probability, Simulation, Models, Psychometrics

Spray, Judith A.; Welch, Catherine J. – Journal of Educational Measurement, 1990
The effect of large, within-examinee item difficulty variability on estimates of the proportion of consistent classification of examinees into mastery categories was studied over 2 test administrations for 100 simulated examinees. The proportion of consistent classifications was adequately estimated using the technique proposed by M. Subkoviak…
Descriptors: Classification, Difficulty Level, Estimation (Mathematics), Item Response Theory
Meijer, Rob R. – Journal of Educational Measurement, 2004
Two new methods have been proposed to determine unexpected sum scores on sub-tests (testlets) both for paper-and-pencil tests and computer adaptive tests. A method based on a conservative bound using the hypergeometric distribution, denoted p, was compared with a method where the probability for each score combination was calculated using a…
Descriptors: Probability, Adaptive Testing, Item Response Theory, Scores