NotesFAQContact Us
Collection
Advanced
Search Tips
Showing 1 to 15 of 38 results Save | Export
Peer reviewed Peer reviewed
Direct linkDirect link
Feinberg, Richard; Jurich, Daniel; Wise, Steven L. – Applied Measurement in Education, 2021
Previous research on rapid responding tends to implicitly consider examinees as either engaging in solution behavior or purely guessing. However, particularly in a high-stakes testing context, examinees perceiving that they are running out of time may consider the remaining items for less time than necessary to provide a fully informed response,…
Descriptors: High Stakes Tests, Reaction Time, Response Style (Tests), Licensing Examinations (Professions)
Peer reviewed Peer reviewed
Direct linkDirect link
Alahmadi, Sarah; Jones, Andrew T.; Barry, Carol L.; Ibáñez, Beatriz – Applied Measurement in Education, 2023
Rasch common-item equating is often used in high-stakes testing to maintain equivalent passing standards across test administrations. If unaddressed, item parameter drift poses a major threat to the accuracy of Rasch common-item equating. We compared the performance of well-established and newly developed drift detection methods in small and large…
Descriptors: Equated Scores, Item Response Theory, Sample Size, Test Items
Peer reviewed Peer reviewed
Direct linkDirect link
Wyse, Adam E. – Applied Measurement in Education, 2018
An important consideration in standard setting is recruiting a group of panelists with different experiences and backgrounds to serve on the standard-setting panel. This study uses data from 14 different Angoff standard settings from a variety of medical imaging credentialing programs to examine whether people with different professional roles and…
Descriptors: Standard Setting (Scoring), Test Construction, Cutting Scores, Accuracy
Peer reviewed Peer reviewed
Direct linkDirect link
Diao, Hongyu; Keller, Lisa – Applied Measurement in Education, 2020
Examinees who attempt the same test multiple times are often referred to as "repeaters." Previous studies suggested that repeaters should be excluded from the total sample before equating because repeater groups are distinguishable from non-repeater groups. In addition, repeaters might memorize anchor items, causing item drift under a…
Descriptors: Licensing Examinations (Professions), College Entrance Examinations, Repetition, Testing Problems
Peer reviewed Peer reviewed
Direct linkDirect link
Haladyna, Thomas M.; Rodriguez, Michael C.; Stevens, Craig – Applied Measurement in Education, 2019
The evidence is mounting regarding the guidance to employ more three-option multiple-choice items. From theoretical analyses, empirical results, and practical considerations, such items are of equal or higher quality than four- or five-option items, and more items can be administered to improve content coverage. This study looks at 58 tests,…
Descriptors: Multiple Choice Tests, Test Items, Testing Problems, Guessing (Tests)
Peer reviewed Peer reviewed
Direct linkDirect link
Wyse, Adam E. – Applied Measurement in Education, 2018
This article discusses regression effects that are commonly observed in Angoff ratings where panelists tend to think that hard items are easier than they are and easy items are more difficult than they are in comparison to estimated item difficulties. Analyses of data from two credentialing exams illustrate these regression effects and the…
Descriptors: Regression (Statistics), Test Items, Difficulty Level, Licensing Examinations (Professions)
Peer reviewed Peer reviewed
Direct linkDirect link
Clauser, Jerome C.; Clauser, Brian E.; Hambleton, Ronald K. – Applied Measurement in Education, 2014
The purpose of the present study was to extend past work with the Angoff method for setting standards by examining judgments at the judge level rather than the panel level. The focus was on investigating the relationship between observed Angoff standard setting judgments and empirical conditional probabilities. This relationship has been used as a…
Descriptors: Standard Setting (Scoring), Validity, Reliability, Correlation
Peer reviewed Peer reviewed
Direct linkDirect link
Kannan, Priya; Sgammato, Adrienne; Tannenbaum, Richard J.; Katz, Irvin R. – Applied Measurement in Education, 2015
The Angoff method requires experts to view every item on the test and make a probability judgment. This can be time consuming when there are large numbers of items on the test. In this study, a G-theory framework was used to determine if a subset of items can be used to make generalizable cut-score recommendations. Angoff ratings (i.e.,…
Descriptors: Reliability, Standard Setting (Scoring), Cutting Scores, Test Items
Peer reviewed Peer reviewed
Direct linkDirect link
Kannan, Priya; Sgammato, Adrienne; Tannenbaum, Richard J. – Applied Measurement in Education, 2015
Establishing cut scores using the Angoff method requires panelists to evaluate every item on a test and make a probability judgment. This can be time-consuming when there are large numbers of items on the test. Previous research using resampling studies suggest that it is possible to recommend stable Angoff-based cut score estimates using a…
Descriptors: Cutting Scores, Test Items, Standard Setting (Scoring), Feasibility Studies
Peer reviewed Peer reviewed
Direct linkDirect link
Gierl, Mark J.; Lai, Hollis; Pugh, Debra; Touchie, Claire; Boulais, André-Philippe; De Champlain, André – Applied Measurement in Education, 2016
Item development is a time- and resource-intensive process. Automatic item generation integrates cognitive modeling with computer technology to systematically generate test items. To date, however, items generated using cognitive modeling procedures have received limited use in operational testing situations. As a result, the psychometric…
Descriptors: Psychometrics, Multiple Choice Tests, Test Items, Item Analysis
Peer reviewed Peer reviewed
Direct linkDirect link
Kim, Sooyeon; Walker, Michael E. – Applied Measurement in Education, 2012
This study investigated the impact of repeat takers of a licensure test on the equating functions in the context of a nonequivalent groups with anchor test (NEAT) design. Examinees who had taken a new, to-be-equated form of the test were divided into three subgroups according to their previous testing experience: (a) repeaters who previously took…
Descriptors: Equated Scores, Licensing Examinations (Professions), Repetition, Regression (Statistics)
Peer reviewed Peer reviewed
Direct linkDirect link
Kim, Sooyeon; Livingston, Samuel A.; Lewis, Charles – Applied Measurement in Education, 2011
This article describes a preliminary investigation of an empirical Bayes (EB) procedure for using collateral information to improve equating of scores on test forms taken by small numbers of examinees. Resampling studies were done on two different forms of the same test. In each study, EB and non-EB versions of two equating methods--chained linear…
Descriptors: Sample Size, Equated Scores, Bayesian Statistics, Accuracy
Peer reviewed Peer reviewed
Direct linkDirect link
Bonner, Sarah M.; D'Agostino, Jerome V. – Applied Measurement in Education, 2012
We investigated examinees' cognitive processes while they solved selected items from the Multistate Bar Exam (MBE), a high-stakes professional certification examination. We focused on ascertaining those mental processes most frequently used by examinees, and the most common types of errors in their thinking. We compared the relationships between…
Descriptors: Cognitive Processes, Test Items, Problem Solving, Thinking Skills
Peer reviewed Peer reviewed
Direct linkDirect link
Kim, Sooyeon; Walker, Michael – Applied Measurement in Education, 2012
This study examined the appropriateness of the anchor composition in a mixed-format test, which includes both multiple-choice (MC) and constructed-response (CR) items, using subpopulation invariance indices. Linking functions were derived in the nonequivalent groups with anchor test (NEAT) design using two types of anchor sets: (a) MC only and (b)…
Descriptors: Multiple Choice Tests, Test Format, Test Items, Equated Scores
Peer reviewed Peer reviewed
Direct linkDirect link
Puhan, Gautam; Sinharay, Sandip; Haberman, Shelby; Larkin, Kevin – Applied Measurement in Education, 2010
Will subscores provide additional information than what is provided by the total score? Is there a method that can estimate more trustworthy subscores than observed subscores? To answer the first question, this study evaluated whether the true subscore was more accurately predicted by the observed subscore or total score. To answer the second…
Descriptors: Licensing Examinations (Professions), Scores, Computation, Methods
Previous Page | Next Page »
Pages: 1  |  2  |  3