Publication Date
In 2025 | 0 |
Since 2024 | 0 |
Since 2021 (last 5 years) | 1 |
Since 2016 (last 10 years) | 10 |
Since 2006 (last 20 years) | 23 |
Descriptor
Probability | 37 |
Test Items | 12 |
Item Response Theory | 11 |
Models | 11 |
Simulation | 10 |
Scores | 6 |
Test Reliability | 6 |
Comparative Analysis | 5 |
Computation | 5 |
Computer Assisted Testing | 5 |
Difficulty Level | 5 |
More ▼ |
Source
Journal of Educational… | 37 |
Author
Gierl, Mark J. | 2 |
Sinharay, Sandip | 2 |
Subkoviak, Michael J. | 2 |
Andersson, Björn | 1 |
Anselmi, Pasquale | 1 |
Armstrong, Ronald D. | 1 |
Babcock, Ben | 1 |
Beland, Anne | 1 |
Belov, Dmitry I. | 1 |
Bennett, Randy E. | 1 |
Beretvas, S. Natasha | 1 |
More ▼ |
Publication Type
Journal Articles | 30 |
Reports - Research | 15 |
Reports - Evaluative | 12 |
Reports - Descriptive | 3 |
Education Level
Secondary Education | 2 |
Higher Education | 1 |
Postsecondary Education | 1 |
Audience
Location
Laws, Policies, & Programs
Assessments and Surveys
National Assessment of… | 2 |
Program for International… | 2 |
SAT (College Admission Test) | 2 |
Indiana Statewide Testing for… | 1 |
Law School Admission Test | 1 |
Trends in International… | 1 |
What Works Clearinghouse Rating
Wyse, Adam E.; Babcock, Ben – Journal of Educational Measurement, 2019
One common phenomenon in Angoff standard setting is that panelists regress their ratings in toward the middle of the probability scale. This study describes two indices based on taking ratios of standard deviations that can be utilized with a scatterplot of item ratings versus expected probabilities of success to identify whether ratings are…
Descriptors: Item Analysis, Standard Setting, Probability, Feedback (Response)
DeCarlo, Lawrence T. – Journal of Educational Measurement, 2021
In a signal detection theory (SDT) approach to multiple choice exams, examinees are viewed as choosing, for each item, the alternative that is perceived as being the most plausible, with perceived plausibility depending in part on whether or not an item is known. The SDT model is a process model and provides measures of item difficulty, item…
Descriptors: Perception, Bias, Theories, Test Items
Clauser, Brian E.; Kane, Michael; Clauser, Jerome C. – Journal of Educational Measurement, 2020
An Angoff standard setting study generally yields judgments on a number of items by a number of judges (who may or may not be nested in panels). Variability associated with judges (and possibly panels) contributes error to the resulting cut score. The variability associated with items plays a more complicated role. To the extent that the mean item…
Descriptors: Cutting Scores, Generalization, Decision Making, Standard Setting
Guo, Hongwen; Deane, Paul D.; van Rijn, Peter W.; Zhang, Mo; Bennett, Randy E. – Journal of Educational Measurement, 2018
The goal of this study is to model pauses extracted from writing keystroke logs as a way of characterizing the processes students use in essay composition. Low-level timing data were modeled, the interkey interval and its subtype, the intraword duration, thought to reflect processes associated with keyboarding skills and composition fluency.…
Descriptors: Writing Processes, Writing (Composition), Essays, Models
Tijmstra, Jesper; Bolsinova, Maria; Liaw, Yuan-Ling; Rutkowski, Leslie; Rutkowski, David – Journal of Educational Measurement, 2020
Although the root-mean squared deviation (RMSD) is a popular statistical measure for evaluating country-specific item-level misfit (i.e., differential item functioning [DIF]) in international large-scale assessment, this paper shows that its sensitivity to detect misfit may depend strongly on the proficiency distribution of the considered…
Descriptors: Test Items, Goodness of Fit, Probability, Accuracy
Kang, Hyeon-Ah; Zhang, Susu; Chang, Hua-Hua – Journal of Educational Measurement, 2017
The development of cognitive diagnostic-computerized adaptive testing (CD-CAT) has provided a new perspective for gaining information about examinees' mastery on a set of cognitive attributes. This study proposes a new item selection method within the framework of dual-objective CD-CAT that simultaneously addresses examinees' attribute mastery…
Descriptors: Computer Assisted Testing, Adaptive Testing, Cognitive Tests, Test Items
Andersson, Björn – Journal of Educational Measurement, 2016
In observed-score equipercentile equating, the goal is to make scores on two scales or tests measuring the same construct comparable by matching the percentiles of the respective score distributions. If the tests consist of different items with multiple categories for each item, a suitable model for the responses is a polytomous item response…
Descriptors: Equated Scores, Item Response Theory, Error of Measurement, Tests
Wang, Wenyi; Song, Lihong; Chen, Ping; Meng, Yaru; Ding, Shuliang – Journal of Educational Measurement, 2015
Classification consistency and accuracy are viewed as important indicators for evaluating the reliability and validity of classification results in cognitive diagnostic assessment (CDA). Pattern-level classification consistency and accuracy indices were introduced by Cui, Gierl, and Chang. However, the indices at the attribute level have not yet…
Descriptors: Classification, Reliability, Accuracy, Cognitive Tests
Halpin, Peter F.; von Davier, Alina A.; Hao, Jiangang; Liu, Lei – Journal of Educational Measurement, 2017
This article addresses performance assessments that involve collaboration among students. We apply the Hawkes process to infer whether the actions of one student are associated with increased probability of further actions by his/her partner(s) in the near future. This leads to an intuitive notion of engagement among collaborators, and we consider…
Descriptors: Performance Based Assessment, Student Evaluation, Cooperative Learning, Inferences
Falk, Carl F.; Cai, Li – Journal of Educational Measurement, 2016
We present a logistic function of a monotonic polynomial with a lower asymptote, allowing additional flexibility beyond the three-parameter logistic model. We develop a maximum marginal likelihood-based approach to estimate the item parameters. The new item response model is demonstrated on math assessment data from a state, and a computationally…
Descriptors: Item Response Theory, Guessing (Tests), Mathematics Tests, Simulation
Sinharay, Sandip; Wan, Ping; Whitaker, Mike; Kim, Dong-In; Zhang, Litong; Choi, Seung W. – Journal of Educational Measurement, 2014
With an increase in the number of online tests, interruptions during testing due to unexpected technical issues seem unavoidable. For example, interruptions occurred during several recent state tests. When interruptions occur, it is important to determine the extent of their impact on the examinees' scores. There is a lack of research on this…
Descriptors: Computer Assisted Testing, Testing Problems, Scores, Regression (Statistics)
Liu, Ou Lydia; Liu, Huili; Roohr, Katrina Crotts; McCaffrey, Daniel F. – Journal of Educational Measurement, 2016
Learning outcomes assessment has been widely used by higher education institutions both nationally and internationally. One of its popular uses is to document learning gains of students. Prior studies have recognized the potential imbalance between freshmen and seniors in terms of their background characteristics and their prior academic…
Descriptors: College Outcomes Assessment, Achievement Gains, College Freshmen, College Seniors
Belov, Dmitry I. – Journal of Educational Measurement, 2013
The development of statistical methods for detecting test collusion is a new research direction in the area of test security. Test collusion may be described as large-scale sharing of test materials, including answers to test items. Current methods of detecting test collusion are based on statistics also used in answer-copying detection.…
Descriptors: Cheating, Computer Assisted Testing, Adaptive Testing, Statistical Analysis
Huang, Hung-Yu; Wang, Wen-Chung – Journal of Educational Measurement, 2014
The DINA (deterministic input, noisy, and gate) model has been widely used in cognitive diagnosis tests and in the process of test development. The outcomes known as slip and guess are included in the DINA model function representing the responses to the items. This study aimed to extend the DINA model by using the random-effect approach to allow…
Descriptors: Models, Guessing (Tests), Probability, Ability
Zu, Jiyun; Yuan, Ke-Hai – Journal of Educational Measurement, 2012
In the nonequivalent groups with anchor test (NEAT) design, the standard error of linear observed-score equating is commonly estimated by an estimator derived assuming multivariate normality. However, real data are seldom normally distributed, causing this normal estimator to be inconsistent. A general estimator, which does not rely on the…
Descriptors: Sample Size, Equated Scores, Test Items, Error of Measurement