ERIC - Search Results

Publication Date

In 2025	0
Since 2024	0
Since 2021 (last 5 years)	0
Since 2016 (last 10 years)	5
Since 2006 (last 20 years)	15

Descriptor

Cutting Scores	29
Standard Setting (Scoring)	13
Test Items	12
Item Response Theory	7
Licensing Examinations…	7
Standard Setting	7
Test Construction	6
Classification	5
Difficulty Level	5
Evaluation Methods	5
Comparative Analysis	4
Equated Scores	4
Psychometrics	4
Scores	4
Accuracy	3
Achievement Tests	3
Computation	3
Decision Making	3
Elementary Secondary Education	3
Foreign Countries	3
Judges	3
Mathematics Tests	3
Reading Tests	3
Reliability	3
Scoring	3
More ▼

Source

Applied Measurement in…

Publication Type

Journal Articles	29
Reports - Research	19
Reports - Evaluative	8
Reports - Descriptive	2
Speeches/Meeting Papers	2
Information Analyses	1

Education Level

Elementary Education	2
Grade 5	2
High Schools	2
Secondary Education	2
Early Childhood Education	1
Elementary Secondary Education	1
Grade 12	1
Junior High Schools	1
Middle Schools	1

Audience

Teachers	2
Administrators	1

Location

Netherlands	2
Canada	1
Colorado	1
United Kingdom	1

Laws, Policies, & Programs

No Child Left Behind Act 2001

Assessments and Surveys

Advanced Placement…	1
Praxis Series	1

What Works Clearinghouse Rating

Showing 1 to 15 of 29 results Save | Export

Comparing Cut Scores from the Angoff Method and Two Variations of the Hofstee and Beuk Methods

Peer reviewed

Direct link

Wyse, Adam E. – Applied Measurement in Education, 2020

This article compares cut scores from two variations of the Hofstee and Beuk methods, which determine cut scores by resolving inconsistencies in panelists' judgments about cut scores and pass rates, with the Angoff method. The first variation uses responses to the Hofstee and Beuk percentage correct and pass rate questions to calculate cut scores.…

Descriptors: Cutting Scores, Evaluation Methods, Standard Setting (Scoring), Equations (Mathematics)

Investigating the Classification Accuracy of Rasch and Nominal Weights Mean Equating with Very Small Samples

Peer reviewed

Direct link

Furter, Robert T.; Dwyer, Andrew C. – Applied Measurement in Education, 2020

Maintaining equivalent performance standards across forms is a psychometric challenge exacerbated by small samples. In this study, the accuracy of two equating methods (Rasch anchored calibration and nominal weights mean) and four anchor item selection methods were investigated in the context of very small samples (N = 10). Overall, nominal…

Descriptors: Classification, Accuracy, Item Response Theory, Equated Scores

Classification Consistency and Accuracy for Mixed-Format Tests

Peer reviewed

Direct link

Kim, Stella Y.; Lee, Won-Chan – Applied Measurement in Education, 2019

This study explores classification consistency and accuracy for mixed-format tests using real and simulated data. In particular, the current study compares six methods of estimating classification consistency and accuracy for seven mixed-format tests. The relative performance of the estimation methods is evaluated using simulated data. Study…

Descriptors: Classification, Reliability, Accuracy, Test Format

Examining How Professional Roles and Test Development Experiences Impact Angoff Ratings

Peer reviewed

Direct link

Wyse, Adam E. – Applied Measurement in Education, 2018

An important consideration in standard setting is recruiting a group of panelists with different experiences and backgrounds to serve on the standard-setting panel. This study uses data from 14 different Angoff standard settings from a variety of medical imaging credentialing programs to examine whether people with different professional roles and…

Descriptors: Standard Setting (Scoring), Test Construction, Cutting Scores, Accuracy

Regression Effects in Angoff Ratings: Examples from Credentialing Exams

Peer reviewed

Direct link

Wyse, Adam E. – Applied Measurement in Education, 2018

This article discusses regression effects that are commonly observed in Angoff ratings where panelists tend to think that hard items are easier than they are and easy items are more difficult than they are in comparison to estimated item difficulties. Analyses of data from two credentialing exams illustrate these regression effects and the…

Descriptors: Regression (Statistics), Test Items, Difficulty Level, Licensing Examinations (Professions)

Evaluating the Consistency of Angoff-Based Cut Scores Using Subsets of Items within a Generalizability Theory Framework

Peer reviewed

Direct link

Kannan, Priya; Sgammato, Adrienne; Tannenbaum, Richard J.; Katz, Irvin R. – Applied Measurement in Education, 2015

The Angoff method requires experts to view every item on the test and make a probability judgment. This can be time consuming when there are large numbers of items on the test. In this study, a G-theory framework was used to determine if a subset of items can be used to make generalizable cut-score recommendations. Angoff ratings (i.e.,…

Descriptors: Reliability, Standard Setting (Scoring), Cutting Scores, Test Items

Evaluating the Operational Feasibility of Using Subsets of Items to Recommend Minimal Competency Cut Scores

Peer reviewed

Direct link

Kannan, Priya; Sgammato, Adrienne; Tannenbaum, Richard J. – Applied Measurement in Education, 2015

Establishing cut scores using the Angoff method requires panelists to evaluate every item on a test and make a probability judgment. This can be time-consuming when there are large numbers of items on the test. Previous research using resampling studies suggest that it is possible to recommend stable Angoff-based cut score estimates using a…

Descriptors: Cutting Scores, Test Items, Standard Setting (Scoring), Feasibility Studies

The Effect of Small Group Discussion on Cutoff Scores during Standard Setting

Peer reviewed

Direct link

Deunk, Marjolein I.; van Kuijk, Mechteld F.; Bosker, Roel J. – Applied Measurement in Education, 2014

Standard setting methods, like the Bookmark procedure, are used to assist education experts in formulating performance standards. Small group discussion is meant to help these experts in setting more reliable and valid cutoff scores. This study is an analysis of 15 small group discussions during two standards setting trajectories and their effect…

Descriptors: Cutting Scores, Standard Setting, Group Discussion, Reading Tests

Determining the Anchor Composition for a Mixed-Format Test: Evaluation of Subpopulation Invariance of Linking Functions

Peer reviewed

Direct link

Kim, Sooyeon; Walker, Michael – Applied Measurement in Education, 2012

This study examined the appropriateness of the anchor composition in a mixed-format test, which includes both multiple-choice (MC) and constructed-response (CR) items, using subpopulation invariance indices. Linking functions were derived in the nonequivalent groups with anchor test (NEAT) design using two types of anchor sets: (a) MC only and (b)…

Descriptors: Multiple Choice Tests, Test Format, Test Items, Equated Scores

Evidence-Centered Assessment Design as a Foundation for Achievement-Level Descriptor Development and for Standard Setting

Peer reviewed

Direct link

Plake, Barbara S.; Huff, Kristen; Reshetar, Rosemary – Applied Measurement in Education, 2010

In many large-scale assessment programs, achievement level descriptors (ALDs) provide a critical role in communicating what scores on the assessment mean and in interpreting what examinees know and are able to do based on their test performance. Based on their test performance, examinees are often classified into performance categories. The…

Descriptors: Evidence, Test Construction, Measurement, Standard Setting

Detecting and Correcting Scale Drift in Test Equating: An Illustration from a Large Scale Testing Program

Peer reviewed

Direct link

Puhan, Gautam – Applied Measurement in Education, 2009

The purpose of this study is to determine the extent of scale drift on a test that employs cut scores. It was essential to examine scale drift for this testing program because new forms in this testing program are often put on scale through a series of intermediate equatings (known as equating chains). This process may cause equating error to…

Descriptors: Testing Programs, Testing, Measurement Techniques, Item Response Theory

An Empirical Examination of the Impact of Group Discussion and Examinee Performance Information on Judgments Made in the Angoff Standard-Setting Procedure

Peer reviewed

Direct link

Clauser, Brian E.; Harik, Polina; Margolis, Melissa J.; McManus, I. C.; Mollon, Jennifer; Chis, Liliana; Williams, Simon – Applied Measurement in Education, 2009

Numerous studies have compared the Angoff standard-setting procedure to other standard-setting methods, but relatively few studies have evaluated the procedure based on internal criteria. This study uses a generalizability theory framework to evaluate the stability of the estimated cut score. To provide a measure of internal consistency, this…

Descriptors: Generalizability Theory, Group Discussion, Standard Setting (Scoring), Scoring

A Qualitative Investigation of Panelists' Experiences of Standard Setting Using Two Variations of the Bookmark Method

Peer reviewed

Direct link

Hein, Serge F.; Skaggs, Gary E. – Applied Measurement in Education, 2009

Only a small number of qualitative studies have investigated panelists' experiences during standard-setting activities or the thought processes associated with panelists' actions. This qualitative study involved an examination of the experiences of 11 panelists who participated in a prior, one-day standard-setting meeting in which either the…

Descriptors: Focus Groups, Standard Setting, Cutting Scores, Cognitive Processes

Evaluation of the Standard Setting on the 2005 Grade 12 National Assessment of Educational Progress Mathematics Test

Peer reviewed

Direct link

Sireci, Stephen G.; Hauger, Jeffrey B.; Wells, Craig S.; Shea, Christine; Zenisky, April L. – Applied Measurement in Education, 2009

The National Assessment Governing Board used a new method to set achievement level standards on the 2005 Grade 12 NAEP Math test. In this article, we summarize our independent evaluation of the process used to set these standards. The evaluation data included observations of the standard-setting meeting, observations of advisory committee meetings…

Descriptors: Advisory Committees, Mathematics Tests, Standard Setting, National Competency Tests

Setting Passing Scores on Passage-Based Tests: A Comparison of Traditional and Single-Passage Bookmark Methods

Peer reviewed

Direct link

Skaggs, Gary; Hein, Serge F.; Awuor, Risper – Applied Measurement in Education, 2007

In this study, a variation of the bookmark standard setting procedure for passage-based tests is proposed in which separate ordered item booklets are created for the items associated with each passage. This variation is compared to the traditional bookmark procedure for a fifth-grade reading test. The results showed that the single-passage…

Descriptors: Reading Tests, Standard Setting, Cutting Scores, Grade 5

Previous Page | Next Page »

Pages: 1 | 2

Wyse, Adam E.	3
Giraud, Gerald	2
Hein, Serge F.	2
Impara, James C.	2
Kannan, Priya	2
Mehrens, William A.	2
Plake, Barbara S.	2
Sgammato, Adrienne	2
Tannenbaum, Richard J.	2
Angoff, William H.	1
Awuor, Risper	1
Bosker, Roel J.	1
Chang, Lei	1
Chis, Liliana	1
Clauser, Brian E.	1
Cohen, Allan S.	1
Crooks, Terence J.	1
Deunk, Marjolein I.	1
Dwyer, Andrew C.	1
Furter, Robert T.	1
Goodwin, Laura D.	1
Harik, Polina	1
Haug, Carolyn A.	1
Hauger, Jeffrey B.	1
More ▼