ERIC - Search Results

Publication Date

In 2025	0
Since 2024	0
Since 2021 (last 5 years)	2
Since 2016 (last 10 years)	7
Since 2006 (last 20 years)	16

Descriptor

Licensing Examinations…	38
Certification	12
Test Items	12
Standard Setting (Scoring)	10
Test Construction	8
Cutting Scores	7
Multiple Choice Tests	7
Decision Making	6
Equated Scores	6
Scores	6
Higher Education	5
Performance Based Assessment	5
Reliability	5
Scoring	5
Correlation	4
Credentials	4
Difficulty Level	4
Item Response Theory	4
Judges	4
Models	4
Regression (Statistics)	4
Sampling	4
Teacher Certification	4
Accuracy	3
Comparative Analysis	3
More ▼

Source

Applied Measurement in…

Publication Type

Journal Articles	38
Reports - Research	24
Reports - Evaluative	14
Information Analyses	2
Speeches/Meeting Papers	2
Book/Product Reviews	1
Collected Works - General	1

Education Level

Higher Education	3
Postsecondary Education	3
Junior High Schools	1
Middle Schools	1
Secondary Education	1

Audience

Location

Arizona	1
Canada	1
New York	1

Laws, Policies, & Programs

Assessments and Surveys

United States Medical…	2
Bar Examinations	1
National Assessment of…	1
Praxis Series	1

What Works Clearinghouse Rating

Showing 1 to 15 of 38 results Save | Export

Reconceptualizing Rapid Responses as a Speededness Indicator in High-Stakes Assessments

Peer reviewed

Direct link

Feinberg, Richard; Jurich, Daniel; Wise, Steven L. – Applied Measurement in Education, 2021

Previous research on rapid responding tends to implicitly consider examinees as either engaging in solution behavior or purely guessing. However, particularly in a high-stakes testing context, examinees perceiving that they are running out of time may consider the remaining items for less time than necessary to provide a fully informed response,…

Descriptors: High Stakes Tests, Reaction Time, Response Style (Tests), Licensing Examinations (Professions)

Comparing Drift Detection Methods for Accurate Rasch Equating in Different Sample Sizes

Peer reviewed

Direct link

Alahmadi, Sarah; Jones, Andrew T.; Barry, Carol L.; Ibáñez, Beatriz – Applied Measurement in Education, 2023

Rasch common-item equating is often used in high-stakes testing to maintain equivalent passing standards across test administrations. If unaddressed, item parameter drift poses a major threat to the accuracy of Rasch common-item equating. We compared the performance of well-established and newly developed drift detection methods in small and large…

Descriptors: Equated Scores, Item Response Theory, Sample Size, Test Items

Examining How Professional Roles and Test Development Experiences Impact Angoff Ratings

Peer reviewed

Direct link

Wyse, Adam E. – Applied Measurement in Education, 2018

An important consideration in standard setting is recruiting a group of panelists with different experiences and backgrounds to serve on the standard-setting panel. This study uses data from 14 different Angoff standard settings from a variety of medical imaging credentialing programs to examine whether people with different professional roles and…

Descriptors: Standard Setting (Scoring), Test Construction, Cutting Scores, Accuracy

Investigating Repeater Effects on Small Sample Equating: Include or Exclude?

Peer reviewed

Direct link

Diao, Hongyu; Keller, Lisa – Applied Measurement in Education, 2020

Examinees who attempt the same test multiple times are often referred to as "repeaters." Previous studies suggested that repeaters should be excluded from the total sample before equating because repeater groups are distinguishable from non-repeater groups. In addition, repeaters might memorize anchor items, causing item drift under a…

Descriptors: Licensing Examinations (Professions), College Entrance Examinations, Repetition, Testing Problems

Are Multiple-Choice Items Too Fat?

Peer reviewed

Direct link

Haladyna, Thomas M.; Rodriguez, Michael C.; Stevens, Craig – Applied Measurement in Education, 2019

The evidence is mounting regarding the guidance to employ more three-option multiple-choice items. From theoretical analyses, empirical results, and practical considerations, such items are of equal or higher quality than four- or five-option items, and more items can be administered to improve content coverage. This study looks at 58 tests,…

Descriptors: Multiple Choice Tests, Test Items, Testing Problems, Guessing (Tests)

Regression Effects in Angoff Ratings: Examples from Credentialing Exams

Peer reviewed

Direct link

Wyse, Adam E. – Applied Measurement in Education, 2018

This article discusses regression effects that are commonly observed in Angoff ratings where panelists tend to think that hard items are easier than they are and easy items are more difficult than they are in comparison to estimated item difficulties. Analyses of data from two credentialing exams illustrate these regression effects and the…

Descriptors: Regression (Statistics), Test Items, Difficulty Level, Licensing Examinations (Professions)

Increasing the Validity of Angoff Standards through Analysis of Judge-Level Internal Consistency

Peer reviewed

Direct link

Clauser, Jerome C.; Clauser, Brian E.; Hambleton, Ronald K. – Applied Measurement in Education, 2014

The purpose of the present study was to extend past work with the Angoff method for setting standards by examining judgments at the judge level rather than the panel level. The focus was on investigating the relationship between observed Angoff standard setting judgments and empirical conditional probabilities. This relationship has been used as a…

Descriptors: Standard Setting (Scoring), Validity, Reliability, Correlation

Evaluating the Consistency of Angoff-Based Cut Scores Using Subsets of Items within a Generalizability Theory Framework

Peer reviewed

Direct link

Kannan, Priya; Sgammato, Adrienne; Tannenbaum, Richard J.; Katz, Irvin R. – Applied Measurement in Education, 2015

The Angoff method requires experts to view every item on the test and make a probability judgment. This can be time consuming when there are large numbers of items on the test. In this study, a G-theory framework was used to determine if a subset of items can be used to make generalizable cut-score recommendations. Angoff ratings (i.e.,…

Descriptors: Reliability, Standard Setting (Scoring), Cutting Scores, Test Items

Evaluating the Operational Feasibility of Using Subsets of Items to Recommend Minimal Competency Cut Scores

Peer reviewed

Direct link

Kannan, Priya; Sgammato, Adrienne; Tannenbaum, Richard J. – Applied Measurement in Education, 2015

Establishing cut scores using the Angoff method requires panelists to evaluate every item on a test and make a probability judgment. This can be time-consuming when there are large numbers of items on the test. Previous research using resampling studies suggest that it is possible to recommend stable Angoff-based cut score estimates using a…

Descriptors: Cutting Scores, Test Items, Standard Setting (Scoring), Feasibility Studies

Evaluating the Psychometric Characteristics of Generated Multiple-Choice Test Items

Peer reviewed

Direct link

Gierl, Mark J.; Lai, Hollis; Pugh, Debra; Touchie, Claire; Boulais, André-Philippe; De Champlain, André – Applied Measurement in Education, 2016

Item development is a time- and resource-intensive process. Automatic item generation integrates cognitive modeling with computer technology to systematically generate test items. To date, however, items generated using cognitive modeling procedures have received limited use in operational testing situations. As a result, the psychometric…

Descriptors: Psychometrics, Multiple Choice Tests, Test Items, Item Analysis

Investigating Repeater Effects on Chained Equipercentile Equating with Common Anchor Items

Peer reviewed

Direct link

Kim, Sooyeon; Walker, Michael E. – Applied Measurement in Education, 2012

This study investigated the impact of repeat takers of a licensure test on the equating functions in the context of a nonequivalent groups with anchor test (NEAT) design. Examinees who had taken a new, to-be-equated form of the test were divided into three subgroups according to their previous testing experience: (a) repeaters who previously took…

Descriptors: Equated Scores, Licensing Examinations (Professions), Repetition, Regression (Statistics)

Collateral Information for Equating in Small Samples: A Preliminary Investigation

Peer reviewed

Direct link

Kim, Sooyeon; Livingston, Samuel A.; Lewis, Charles – Applied Measurement in Education, 2011

This article describes a preliminary investigation of an empirical Bayes (EB) procedure for using collateral information to improve equating of scores on test forms taken by small numbers of examinees. Resampling studies were done on two different forms of the same test. In each study, EB and non-EB versions of two equating methods--chained linear…

Descriptors: Sample Size, Equated Scores, Bayesian Statistics, Accuracy

A Substantive Process Analysis of Responses to Items from the Multistate Bar Examination

Peer reviewed

Direct link

Bonner, Sarah M.; D'Agostino, Jerome V. – Applied Measurement in Education, 2012

We investigated examinees' cognitive processes while they solved selected items from the Multistate Bar Exam (MBE), a high-stakes professional certification examination. We focused on ascertaining those mental processes most frequently used by examinees, and the most common types of errors in their thinking. We compared the relationships between…

Descriptors: Cognitive Processes, Test Items, Problem Solving, Thinking Skills

Determining the Anchor Composition for a Mixed-Format Test: Evaluation of Subpopulation Invariance of Linking Functions

Peer reviewed

Direct link

Kim, Sooyeon; Walker, Michael – Applied Measurement in Education, 2012

This study examined the appropriateness of the anchor composition in a mixed-format test, which includes both multiple-choice (MC) and constructed-response (CR) items, using subpopulation invariance indices. Linking functions were derived in the nonequivalent groups with anchor test (NEAT) design using two types of anchor sets: (a) MC only and (b)…

Descriptors: Multiple Choice Tests, Test Format, Test Items, Equated Scores

The Utility of Augmented Subscores in a Licensure Exam: An Evaluation of Methods Using Empirical Data

Peer reviewed

Direct link

Puhan, Gautam; Sinharay, Sandip; Haberman, Shelby; Larkin, Kevin – Applied Measurement in Education, 2010

Will subscores provide additional information than what is provided by the total score? Is there a method that can estimate more trustworthy subscores than observed subscores? To answer the first question, this study evaluated whether the true subscore was more accurately predicted by the observed subscore or total score. To answer the second…

Descriptors: Licensing Examinations (Professions), Scores, Computation, Methods

Previous Page | Next Page »

Pages: 1 | 2 | 3

Kim, Sooyeon	3
Haladyna, Thomas M.	2
Hambleton, Ronald K.	2
Kannan, Priya	2
Mehrens, William A.	2
Norcini, John	2
Sgammato, Adrienne	2
Tannenbaum, Richard J.	2
Wyse, Adam E.	2
Alahmadi, Sarah	1
Barry, Carol L.	1
Bonner, Sarah M.	1
Boulais, André-Philippe	1
Chinn, Roberta N.	1
Clauser, Brian E.	1
Clauser, Jerome C.	1
D'Agostino, Jerome V.	1
De Champlain, André	1
Diao, Hongyu	1
Downing, Steven M.	1
Feinberg, Richard	1
Forsyth, Robert A.	1
Frary, Robert B.	1
Gierl, Mark J.	1
More ▼