ERIC - Search Results

Publication Date

In 2025	0
Since 2024	0
Since 2021 (last 5 years)	0
Since 2016 (last 10 years)	5
Since 2006 (last 20 years)	8

Descriptor

Probability	10
Simulation	10
Item Response Theory	6
Test Items	4
Models	3
Accuracy	2
Achievement Tests	2
Classification	2
Comparative Analysis	2
Computation	2
Computer Assisted Testing	2
Educational Assessment	2
Error of Measurement	2
Item Analysis	2
Regression (Statistics)	2
Scores	2
Statistical Analysis	2
Student Evaluation	2
Adaptive Testing	1
Admission (School)	1
Algebra	1
Cognitive Tests	1
Comparative Testing	1
Computer Mediated…	1
Cooperative Learning	1
More ▼

Source

Journal of Educational…

Publication Type

Journal Articles	10
Reports - Research	8
Reports - Descriptive	1
Reports - Evaluative	1

Education Level

Secondary Education

Audience

Location

Laws, Policies, & Programs

Assessments and Surveys

Indiana Statewide Testing for…	1
Program for International…	1

What Works Clearinghouse Rating

Showing all 10 results Save | Export

A Method for Detecting Regression of Hard and Easy Item Angoff Ratings

Peer reviewed

Direct link

Wyse, Adam E.; Babcock, Ben – Journal of Educational Measurement, 2019

One common phenomenon in Angoff standard setting is that panelists regress their ratings in toward the middle of the probability scale. This study describes two indices based on taking ratios of standard deviations that can be utilized with a scatterplot of item ratings versus expected probabilities of success to identify whether ratings are…

Descriptors: Item Analysis, Standard Setting, Probability, Feedback (Response)

Sensitivity of the RMSD for Detecting Item-Level Misfit in Low-Performing Countries

Peer reviewed

Direct link

Tijmstra, Jesper; Bolsinova, Maria; Liaw, Yuan-Ling; Rutkowski, Leslie; Rutkowski, David – Journal of Educational Measurement, 2020

Although the root-mean squared deviation (RMSD) is a popular statistical measure for evaluating country-specific item-level misfit (i.e., differential item functioning [DIF]) in international large-scale assessment, this paper shows that its sensitivity to detect misfit may depend strongly on the proficiency distribution of the considered…

Descriptors: Test Items, Goodness of Fit, Probability, Accuracy

Asymptotic Standard Errors of Observed-Score Equating with Polytomous IRT Models

Peer reviewed

Direct link

Andersson, Björn – Journal of Educational Measurement, 2016

In observed-score equipercentile equating, the goal is to make scores on two scales or tests measuring the same construct comparable by matching the percentiles of the respective score distributions. If the tests consist of different items with multiple categories for each item, a suitable model for the responses is a polytomous item response…

Descriptors: Equated Scores, Item Response Theory, Error of Measurement, Tests

Attribute-Level and Pattern-Level Classification Consistency and Accuracy Indices for Cognitive Diagnostic Assessment

Peer reviewed

Direct link

Wang, Wenyi; Song, Lihong; Chen, Ping; Meng, Yaru; Ding, Shuliang – Journal of Educational Measurement, 2015

Classification consistency and accuracy are viewed as important indicators for evaluating the reliability and validity of classification results in cognitive diagnostic assessment (CDA). Pattern-level classification consistency and accuracy indices were introduced by Cui, Gierl, and Chang. However, the indices at the attribute level have not yet…

Descriptors: Classification, Reliability, Accuracy, Cognitive Tests

Measuring Student Engagement during Collaboration

Peer reviewed

Direct link

Halpin, Peter F.; von Davier, Alina A.; Hao, Jiangang; Liu, Lei – Journal of Educational Measurement, 2017

This article addresses performance assessments that involve collaboration among students. We apply the Hawkes process to infer whether the actions of one student are associated with increased probability of further actions by his/her partner(s) in the near future. This leads to an intuitive notion of engagement among collaborators, and we consider…

Descriptors: Performance Based Assessment, Student Evaluation, Cooperative Learning, Inferences

Semiparametric Item Response Functions in the Context of Guessing

Peer reviewed

Direct link

Falk, Carl F.; Cai, Li – Journal of Educational Measurement, 2016

We present a logistic function of a monotonic polynomial with a lower asymptote, allowing additional flexibility beyond the three-parameter logistic model. We develop a maximum marginal likelihood-based approach to estimate the item parameters. The new item response model is demonstrated on math assessment data from a state, and a computationally…

Descriptors: Item Response Theory, Guessing (Tests), Mathematics Tests, Simulation

Determining the Overall Impact of Interruptions during Online Testing

Peer reviewed

Direct link

Sinharay, Sandip; Wan, Ping; Whitaker, Mike; Kim, Dong-In; Zhang, Litong; Choi, Seung W. – Journal of Educational Measurement, 2014

With an increase in the number of online tests, interruptions during testing due to unexpected technical issues seem unavoidable. For example, interruptions occurred during several recent state tests. When interruptions occur, it is important to determine the extent of their impact on the examinees' scores. There is a lack of research on this…

Descriptors: Computer Assisted Testing, Testing Problems, Scores, Regression (Statistics)

Model-Free CUSUM Methods for Person Fit

Peer reviewed

Direct link

Armstrong, Ronald D.; Shi, Min – Journal of Educational Measurement, 2009

This article demonstrates the use of a new class of model-free cumulative sum (CUSUM) statistics to detect person fit given the responses to a linear test. The fundamental statistic being accumulated is the likelihood ratio of two probabilities. The detection performance of this CUSUM scheme is compared to other model-free person-fit statistics…

Descriptors: Probability, Simulation, Models, Psychometrics

Estimation of Classification Consistency When the Probability of a Correct Response Varies.

Peer reviewed

Spray, Judith A.; Welch, Catherine J. – Journal of Educational Measurement, 1990

The effect of large, within-examinee item difficulty variability on estimates of the proportion of consistent classification of examinees into mastery categories was studied over 2 test administrations for 100 simulated examinees. The proportion of consistent classifications was adequately estimated using the technique proposed by M. Subkoviak…

Descriptors: Classification, Difficulty Level, Estimation (Mathematics), Item Response Theory

Using Patterns of Summed Scores in Paper-and-Pencil Tests and Computer-Adaptive Tests to Detect Misfitting Item Score Patterns

Peer reviewed

Direct link

Meijer, Rob R. – Journal of Educational Measurement, 2004

Two new methods have been proposed to determine unexpected sum scores on sub-tests (testlets) both for paper-and-pencil tests and computer adaptive tests. A method based on a conservative bound using the hypergeometric distribution, denoted p, was compared with a method where the probability for each score combination was calculated using a…

Descriptors: Probability, Adaptive Testing, Item Response Theory, Scores

Andersson, Björn	1
Armstrong, Ronald D.	1
Babcock, Ben	1
Bolsinova, Maria	1
Cai, Li	1
Chen, Ping	1
Choi, Seung W.	1
Ding, Shuliang	1
Falk, Carl F.	1
Halpin, Peter F.	1
Hao, Jiangang	1
Kim, Dong-In	1
Liaw, Yuan-Ling	1
Liu, Lei	1
Meijer, Rob R.	1
Meng, Yaru	1
Rutkowski, David	1
Rutkowski, Leslie	1
Shi, Min	1
Sinharay, Sandip	1
Song, Lihong	1
Spray, Judith A.	1
Tijmstra, Jesper	1
Wan, Ping	1
Wang, Wenyi	1
More ▼