ERIC - Search Results

Publication Date

In 2025	0
Since 2024	0
Since 2021 (last 5 years)	1
Since 2016 (last 10 years)	10
Since 2006 (last 20 years)	23

Descriptor

Probability	37
Test Items	12
Item Response Theory	11
Models	11
Simulation	10
Scores	6
Test Reliability	6
Comparative Analysis	5
Computation	5
Computer Assisted Testing	5
Difficulty Level	5
Error of Measurement	5
Item Analysis	5
Mathematical Models	5
Statistical Analysis	5
Bias	4
Mathematics Tests	4
Accuracy	3
Achievement Tests	3
Adaptive Testing	3
College Entrance Examinations	3
Criterion Referenced Tests	3
Equated Scores	3
Evaluation	3
Guessing (Tests)	3
More ▼

Source

Journal of Educational…

Publication Type

Journal Articles	30
Reports - Research	15
Reports - Evaluative	12
Reports - Descriptive	3

Education Level

Secondary Education	2
Higher Education	1
Postsecondary Education	1

Audience

Location

Laws, Policies, & Programs

Assessments and Surveys

National Assessment of…	2
Program for International…	2
SAT (College Admission Test)	2
Indiana Statewide Testing for…	1
Law School Admission Test	1
Trends in International…	1

What Works Clearinghouse Rating

Showing 1 to 15 of 37 results Save | Export

A Method for Detecting Regression of Hard and Easy Item Angoff Ratings

Peer reviewed

Direct link

Wyse, Adam E.; Babcock, Ben – Journal of Educational Measurement, 2019

One common phenomenon in Angoff standard setting is that panelists regress their ratings in toward the middle of the probability scale. This study describes two indices based on taking ratios of standard deviations that can be utilized with a scatterplot of item ratings versus expected probabilities of success to identify whether ratings are…

Descriptors: Item Analysis, Standard Setting, Probability, Feedback (Response)

On Joining a Signal Detection Choice Model with Response Time Models

Peer reviewed

Direct link

DeCarlo, Lawrence T. – Journal of Educational Measurement, 2021

In a signal detection theory (SDT) approach to multiple choice exams, examinees are viewed as choosing, for each item, the alternative that is perceived as being the most plausible, with perceived plausibility depending in part on whether or not an item is known. The SDT model is a process model and provides measures of item difficulty, item…

Descriptors: Perception, Bias, Theories, Test Items

Examining the Precision of Cut Scores within a Generalizability Theory Framework: A Closer Look at the Item Effect

Peer reviewed

Direct link

Clauser, Brian E.; Kane, Michael; Clauser, Jerome C. – Journal of Educational Measurement, 2020

An Angoff standard setting study generally yields judgments on a number of items by a number of judges (who may or may not be nested in panels). Variability associated with judges (and possibly panels) contributes error to the resulting cut score. The variability associated with items plays a more complicated role. To the extent that the mean item…

Descriptors: Cutting Scores, Generalization, Decision Making, Standard Setting

Modeling Basic Writing Processes from Keystroke Logs

Peer reviewed

Direct link

Guo, Hongwen; Deane, Paul D.; van Rijn, Peter W.; Zhang, Mo; Bennett, Randy E. – Journal of Educational Measurement, 2018

The goal of this study is to model pauses extracted from writing keystroke logs as a way of characterizing the processes students use in essay composition. Low-level timing data were modeled, the interkey interval and its subtype, the intraword duration, thought to reflect processes associated with keyboarding skills and composition fluency.…

Descriptors: Writing Processes, Writing (Composition), Essays, Models

Sensitivity of the RMSD for Detecting Item-Level Misfit in Low-Performing Countries

Peer reviewed

Direct link

Tijmstra, Jesper; Bolsinova, Maria; Liaw, Yuan-Ling; Rutkowski, Leslie; Rutkowski, David – Journal of Educational Measurement, 2020

Although the root-mean squared deviation (RMSD) is a popular statistical measure for evaluating country-specific item-level misfit (i.e., differential item functioning [DIF]) in international large-scale assessment, this paper shows that its sensitivity to detect misfit may depend strongly on the proficiency distribution of the considered…

Descriptors: Test Items, Goodness of Fit, Probability, Accuracy

Dual-Objective Item Selection Criteria in Cognitive Diagnostic Computerized Adaptive Testing

Peer reviewed

Direct link

Kang, Hyeon-Ah; Zhang, Susu; Chang, Hua-Hua – Journal of Educational Measurement, 2017

The development of cognitive diagnostic-computerized adaptive testing (CD-CAT) has provided a new perspective for gaining information about examinees' mastery on a set of cognitive attributes. This study proposes a new item selection method within the framework of dual-objective CD-CAT that simultaneously addresses examinees' attribute mastery…

Descriptors: Computer Assisted Testing, Adaptive Testing, Cognitive Tests, Test Items

Asymptotic Standard Errors of Observed-Score Equating with Polytomous IRT Models

Peer reviewed

Direct link

Andersson, Björn – Journal of Educational Measurement, 2016

In observed-score equipercentile equating, the goal is to make scores on two scales or tests measuring the same construct comparable by matching the percentiles of the respective score distributions. If the tests consist of different items with multiple categories for each item, a suitable model for the responses is a polytomous item response…

Descriptors: Equated Scores, Item Response Theory, Error of Measurement, Tests

Attribute-Level and Pattern-Level Classification Consistency and Accuracy Indices for Cognitive Diagnostic Assessment

Peer reviewed

Direct link

Wang, Wenyi; Song, Lihong; Chen, Ping; Meng, Yaru; Ding, Shuliang – Journal of Educational Measurement, 2015

Classification consistency and accuracy are viewed as important indicators for evaluating the reliability and validity of classification results in cognitive diagnostic assessment (CDA). Pattern-level classification consistency and accuracy indices were introduced by Cui, Gierl, and Chang. However, the indices at the attribute level have not yet…

Descriptors: Classification, Reliability, Accuracy, Cognitive Tests

Measuring Student Engagement during Collaboration

Peer reviewed

Direct link

Halpin, Peter F.; von Davier, Alina A.; Hao, Jiangang; Liu, Lei – Journal of Educational Measurement, 2017

This article addresses performance assessments that involve collaboration among students. We apply the Hawkes process to infer whether the actions of one student are associated with increased probability of further actions by his/her partner(s) in the near future. This leads to an intuitive notion of engagement among collaborators, and we consider…

Descriptors: Performance Based Assessment, Student Evaluation, Cooperative Learning, Inferences

Semiparametric Item Response Functions in the Context of Guessing

Peer reviewed

Direct link

Falk, Carl F.; Cai, Li – Journal of Educational Measurement, 2016

We present a logistic function of a monotonic polynomial with a lower asymptote, allowing additional flexibility beyond the three-parameter logistic model. We develop a maximum marginal likelihood-based approach to estimate the item parameters. The new item response model is demonstrated on math assessment data from a state, and a computationally…

Descriptors: Item Response Theory, Guessing (Tests), Mathematics Tests, Simulation

Determining the Overall Impact of Interruptions during Online Testing

Peer reviewed

Direct link

Sinharay, Sandip; Wan, Ping; Whitaker, Mike; Kim, Dong-In; Zhang, Litong; Choi, Seung W. – Journal of Educational Measurement, 2014

With an increase in the number of online tests, interruptions during testing due to unexpected technical issues seem unavoidable. For example, interruptions occurred during several recent state tests. When interruptions occur, it is important to determine the extent of their impact on the examinees' scores. There is a lack of research on this…

Descriptors: Computer Assisted Testing, Testing Problems, Scores, Regression (Statistics)

Investigating College Learning Gain: Exploring a Propensity Score Weighting Approach

Peer reviewed

Direct link

Liu, Ou Lydia; Liu, Huili; Roohr, Katrina Crotts; McCaffrey, Daniel F. – Journal of Educational Measurement, 2016

Learning outcomes assessment has been widely used by higher education institutions both nationally and internationally. One of its popular uses is to document learning gains of students. Prior studies have recognized the potential imbalance between freshmen and seniors in terms of their background characteristics and their prior academic…

Descriptors: College Outcomes Assessment, Achievement Gains, College Freshmen, College Seniors

Detection of Test Collusion via Kullback-Leibler Divergence

Peer reviewed

Direct link

Belov, Dmitry I. – Journal of Educational Measurement, 2013

The development of statistical methods for detecting test collusion is a new research direction in the area of test security. Test collusion may be described as large-scale sharing of test materials, including answers to test items. Current methods of detecting test collusion are based on statistics also used in answer-copying detection.…

Descriptors: Cheating, Computer Assisted Testing, Adaptive Testing, Statistical Analysis

The Random-Effect DINA Model

Peer reviewed

Direct link

Huang, Hung-Yu; Wang, Wen-Chung – Journal of Educational Measurement, 2014

The DINA (deterministic input, noisy, and gate) model has been widely used in cognitive diagnosis tests and in the process of test development. The outcomes known as slip and guess are included in the DINA model function representing the responses to the items. This study aimed to extend the DINA model by using the random-effect approach to allow…

Descriptors: Models, Guessing (Tests), Probability, Ability

Standard Error of Linear Observed-Score Equating for the NEAT Design with Nonnormally Distributed Data

Peer reviewed

Direct link

Zu, Jiyun; Yuan, Ke-Hai – Journal of Educational Measurement, 2012

In the nonequivalent groups with anchor test (NEAT) design, the standard error of linear observed-score equating is commonly estimated by an estimator derived assuming multivariate normality. However, real data are seldom normally distributed, causing this normal estimator to be inconsistent. A general estimator, which does not rely on the…

Descriptors: Sample Size, Equated Scores, Test Items, Error of Measurement

Previous Page | Next Page »

Pages: 1 | 2 | 3

Gierl, Mark J.	2
Sinharay, Sandip	2
Subkoviak, Michael J.	2
Andersson, Björn	1
Anselmi, Pasquale	1
Armstrong, Ronald D.	1
Babcock, Ben	1
Beland, Anne	1
Belov, Dmitry I.	1
Bennett, Randy E.	1
Beretvas, S. Natasha	1
Bielinski, John	1
Bolsinova, Maria	1
Brewer, James K.	1
Cahan, Sorel	1
Cai, Li	1
Chang, Hua-Hua	1
Chen, Ping	1
Choi, Seung W.	1
Clauser, Brian E.	1
Clauser, Jerome C.	1
Cui, Ying	1
Davey, Tim	1
Davison, Mark L.	1
More ▼