ERIC - Search Results

Publication Date

In 2025	0
Since 2024	0
Since 2021 (last 5 years)	2
Since 2016 (last 10 years)	2
Since 2006 (last 20 years)	4

Descriptor

Foreign Countries	4
Probability	4
Computation	2
Models	2
Reading Tests	2
Standard Setting (Scoring)	2
Test Items	2
Bayesian Statistics	1
Benchmarking	1
Biotechnology	1
Cognitive Measurement	1
College Entrance Examinations	1
Comparative Analysis	1
Credentials	1
Cutting Scores	1
Difficulty Level	1
Error Patterns	1
Gender Differences	1
Generalizability Theory	1
Grade 7	1
Group Discussion	1
Guessing (Tests)	1
High Stakes Tests	1
Item Analysis	1
Item Response Theory	1
More ▼

Source

Applied Measurement in…

Author

Abu-Ghazalah, Rashid M.	1
Andrich, David	1
Chis, Liliana	1
Clauser, Brian E.	1
Dubins, David N.	1
Ghonsooly, Behzad	1
Harik, Polina	1
Heldsinger, Sandra	1
Humphry, Stephen	1
Margolis, Melissa J.	1
McManus, I. C.	1
Mehrazmay, Roghayeh	1
Mollon, Jennifer	1
Poon, Gregory M. K.	1
Williams, Simon	1
de la Torre, Jimmy	1
More ▼

Publication Type

Journal Articles	4
Reports - Research	3
Reports - Evaluative	1

Education Level

Higher Education	2
Postsecondary Education	2
Elementary Education	1
Grade 7	1
Junior High Schools	1
Middle Schools	1
Secondary Education	1

Audience

Location

Australia	1
Canada	1
Iran	1
United Kingdom	1

Laws, Policies, & Programs

Assessments and Surveys

What Works Clearinghouse Rating

Showing all 4 results Save | Export

Dissecting Knowledge, Guessing, and Blunder in Multiple Choice Assessments

Peer reviewed

Direct link

Abu-Ghazalah, Rashid M.; Dubins, David N.; Poon, Gregory M. K. – Applied Measurement in Education, 2023

Multiple choice results are inherently probabilistic outcomes, as correct responses reflect a combination of knowledge and guessing, while incorrect responses additionally reflect blunder, a confidently committed mistake. To objectively resolve knowledge from responses in an MC test structure, we evaluated probabilistic models that explicitly…

Descriptors: Guessing (Tests), Multiple Choice Tests, Probability, Models

Detecting Differential Item Functioning Using Cognitive Diagnosis Models: Applications of the Wald Test and Likelihood Ratio Test in a University Entrance Examination

Peer reviewed

Direct link

Mehrazmay, Roghayeh; Ghonsooly, Behzad; de la Torre, Jimmy – Applied Measurement in Education, 2021

The present study aims to examine gender differential item functioning (DIF) in the reading comprehension section of a high stakes test using cognitive diagnosis models. Based on the multiple-group generalized deterministic, noisy "and" gate (MG G-DINA) model, the Wald test and likelihood ratio test are used to detect DIF. The flagged…

Descriptors: Test Bias, College Entrance Examinations, Gender Differences, Reading Tests

Requiring a Consistent Unit of Scale between the Responses of Students and Judges in Standard Setting

Peer reviewed

Direct link

Humphry, Stephen; Heldsinger, Sandra; Andrich, David – Applied Measurement in Education, 2014

One of the best-known methods for setting a benchmark standard on a test is that of Angoff and its modifications. When scored dichotomously, judges estimate the probability that a benchmark student has of answering each item correctly. As in most methods of standard setting, it is assumed implicitly that the unit of the latent scale of the…

Descriptors: Foreign Countries, Standard Setting (Scoring), Judges, Item Response Theory

An Empirical Examination of the Impact of Group Discussion and Examinee Performance Information on Judgments Made in the Angoff Standard-Setting Procedure

Peer reviewed

Direct link

Clauser, Brian E.; Harik, Polina; Margolis, Melissa J.; McManus, I. C.; Mollon, Jennifer; Chis, Liliana; Williams, Simon – Applied Measurement in Education, 2009

Numerous studies have compared the Angoff standard-setting procedure to other standard-setting methods, but relatively few studies have evaluated the procedure based on internal criteria. This study uses a generalizability theory framework to evaluate the stability of the estimated cut score. To provide a measure of internal consistency, this…

Descriptors: Generalizability Theory, Group Discussion, Standard Setting (Scoring), Scoring