NotesFAQContact Us
Collection
Advanced
Search Tips
Showing all 10 results Save | Export
Peer reviewed Peer reviewed
Direct linkDirect link
Kim, Sooyeon; Moses, Tim – International Journal of Testing, 2013
The major purpose of this study is to assess the conditions under which single scoring for constructed-response (CR) items is as effective as double scoring in the licensure testing context. We used both empirical datasets of five mixed-format licensure tests collected in actual operational settings and simulated datasets that allowed for the…
Descriptors: Scoring, Test Format, Licensing Examinations (Professions), Test Items
Peer reviewed Peer reviewed
Direct linkDirect link
Davis-Becker, Susan L.; Buckendahl, Chad W. – International Journal of Testing, 2013
A critical component of the standard setting process is collecting evidence to evaluate the recommended cut scores and their use for making decisions and classifying students based on test performance. Kane (1994, 2001) proposed a framework by which practitioners can identify and evaluate evidence of the results of the standard setting from (1)…
Descriptors: Standard Setting (Scoring), Evidence, Validity, Cutting Scores
Peer reviewed Peer reviewed
Direct linkDirect link
Clauser, Brian E.; Mee, Janet; Margolis, Melissa J. – International Journal of Testing, 2013
This study investigated the extent to which the performance data format impacted data use in Angoff standard setting exercises. Judges from two standard settings (a total of five panels) were randomly assigned to one of two groups. The full-data group received two types of data: (1) the proportion of examinees selecting each option and (2) plots…
Descriptors: Standard Setting (Scoring), Cutting Scores, Validity, Reliability
Peer reviewed Peer reviewed
Direct linkDirect link
Kim, Sooyeon; Walker, Michael E.; Larkin, Kevin – International Journal of Testing, 2012
We demonstrate how to assess the potential changes to a test's score scale necessitated by changes to the test specifications when a field study is not feasible. We used a licensure test, which is currently under revision, as an example. We created two research forms from an actual form of the test. One research form was developed with the current…
Descriptors: Equated Scores, Licensing Examinations (Professions), Test Reliability, Construct Validity
Peer reviewed Peer reviewed
Direct linkDirect link
Sinharay, Sandip; Haberman, Shelby J. – International Journal of Testing, 2014
Recently there has been an increasing level of interest in subtest scores, or subscores, for their potential diagnostic value. Haberman (2008) suggested a method to determine if a subscore has added value over the total score. Researchers have often been interested in the performance of subgroups--for example, those based on gender or…
Descriptors: Scores, Achievement Tests, Language Tests, English (Second Language)
Peer reviewed Peer reviewed
Direct linkDirect link
Hess, Brian J.; Johnston, Mary M.; Lipner, Rebecca S. – International Journal of Testing, 2013
Current research on examination response time has focused on tests comprised of traditional multiple-choice items. Consequently, the impact of other innovative or complex item formats on examinee response time is not understood. The present study used multilevel growth modeling to investigate examinee characteristics associated with response time…
Descriptors: Test Items, Test Format, Reaction Time, Individual Characteristics
Peer reviewed Peer reviewed
Direct linkDirect link
Davis-Becker, Susan L.; Buckendahl, Chad W.; Gerrow, Jack – International Journal of Testing, 2011
Throughout the world, cut scores are an important aspect of a high-stakes testing program because they are a key operational component of the interpretation of test scores. One method for setting standards that is prevalent in educational testing programs--the Bookmark method--is intended to be a less cognitively complex alternative to methods…
Descriptors: Standard Setting (Scoring), Cutting Scores, Educational Testing, Licensing Examinations (Professions)
Peer reviewed Peer reviewed
Direct linkDirect link
Wang, Ning; Stahl, John – International Journal of Testing, 2012
This article discusses the use of the Many-Facets Rasch Model, via the FACETS computer program (Linacre, 2006a), to scale job/practice analysis survey data as well as to combine multiple rating scales into single composite weights representing the tasks' relative importance. Results from the Many-Facets Rasch Model are compared with those…
Descriptors: Job Analysis, Surveys, Rating Scales, Scaling
Peer reviewed Peer reviewed
Direct linkDirect link
Stark, Stephen; Chernyshenko, Oleksandr S.; Drasgow, Fritz – International Journal of Testing, 2005
Recently, a question was raised as to whether the multidimensionality of some professional licensing exams is due to the administration of subtests measuring conceptually distinct skills or, alternatively, strategic preparation on the part of groups of examinees attempting to cope with the demands of multiple hurdle certification systems. This…
Descriptors: Accounting, Licensing Examinations (Professions), Factor Analysis, Factor Structure
Peer reviewed Peer reviewed
Robin, Frederic; Sireci, Stephen G.; Hambleton, Ronald K. – International Journal of Testing, 2003
Illustrates how multidimensional scaling (MDS) and differential item functioning (DIF) procedures can be used to evaluate the equivalence of different language versions of an examination. Presents examples of structural differences and DIF across languages. (SLD)
Descriptors: Item Bias, Licensing Examinations (Professions), Multidimensional Scaling, Multilingual Materials