ERIC - Search Results

Publication Date

In 2025	0
Since 2024	0
Since 2021 (last 5 years)	0
Since 2016 (last 10 years)	0
Since 2006 (last 20 years)	8

Descriptor

Licensing Examinations…	10
Correlation	3
Cutting Scores	3
Standard Setting (Scoring)	3
Test Format	3
Test Items	3
Comparative Analysis	2
Reliability	2
Validity	2
Ability	1
Accounting	1
Achievement Tests	1
Age Differences	1
Case Studies	1
Classification	1
Computer Software	1
Computers	1
Construct Validity	1
Data	1
Difficulty Level	1
Educational Testing	1
English (Second Language)	1
Equated Scores	1
Ethnicity	1
Evidence	1
More ▼

Source

International Journal of…

Publication Type

Journal Articles	10
Reports - Research	7
Reports - Descriptive	2
Reports - Evaluative	1

Education Level

Audience

Location

Laws, Policies, & Programs

Assessments and Surveys

United States Medical…

What Works Clearinghouse Rating

Showing all 10 results Save | Export

Determining When Single Scoring for Constructed-Response Items Is as Effective as Double Scoring in Mixed-Format Licensure Tests

Peer reviewed

Direct link

Kim, Sooyeon; Moses, Tim – International Journal of Testing, 2013

The major purpose of this study is to assess the conditions under which single scoring for constructed-response (CR) items is as effective as double scoring in the licensure testing context. We used both empirical datasets of five mixed-format licensure tests collected in actual operational settings and simulated datasets that allowed for the…

Descriptors: Scoring, Test Format, Licensing Examinations (Professions), Test Items

Identifying and Evaluating External Validity Evidence for Passing Scores

Peer reviewed

Direct link

Davis-Becker, Susan L.; Buckendahl, Chad W. – International Journal of Testing, 2013

A critical component of the standard setting process is collecting evidence to evaluate the recommended cut scores and their use for making decisions and classifying students based on test performance. Kane (1994, 2001) proposed a framework by which practitioners can identify and evaluate evidence of the results of the standard setting from (1)…

Descriptors: Standard Setting (Scoring), Evidence, Validity, Cutting Scores

The Effect of Data Format on Integration of Performance Data into Angoff Judgments

Peer reviewed

Direct link

Clauser, Brian E.; Mee, Janet; Margolis, Melissa J. – International Journal of Testing, 2013

This study investigated the extent to which the performance data format impacted data use in Angoff standard setting exercises. Judges from two standard settings (a total of five panels) were randomly assigned to one of two groups. The full-data group received two types of data: (1) the proportion of examinees selecting each option and (2) plots…

Descriptors: Standard Setting (Scoring), Cutting Scores, Validity, Reliability

Examining Possible Construct Changes to a Licensure Test by Evaluating Equating Requirements

Peer reviewed

Direct link

Kim, Sooyeon; Walker, Michael E.; Larkin, Kevin – International Journal of Testing, 2012

We demonstrate how to assess the potential changes to a test's score scale necessitated by changes to the test specifications when a field study is not feasible. We used a licensure test, which is currently under revision, as an example. We created two research forms from an actual form of the test. One research form was developed with the current…

Descriptors: Equated Scores, Licensing Examinations (Professions), Test Reliability, Construct Validity

An Empirical Investigation of Population Invariance in the Value of Subscores

Peer reviewed

Direct link

Sinharay, Sandip; Haberman, Shelby J. – International Journal of Testing, 2014

Recently there has been an increasing level of interest in subtest scores, or subscores, for their potential diagnostic value. Haberman (2008) suggested a method to determine if a subscore has added value over the total score. Researchers have often been interested in the performance of subgroups--for example, those based on gender or…

Descriptors: Scores, Achievement Tests, Language Tests, English (Second Language)

The Impact of Item Format and Examinee Characteristics on Response Times

Peer reviewed

Direct link

Hess, Brian J.; Johnston, Mary M.; Lipner, Rebecca S. – International Journal of Testing, 2013

Current research on examination response time has focused on tests comprised of traditional multiple-choice items. Consequently, the impact of other innovative or complex item formats on examinee response time is not understood. The present study used multilevel growth modeling to investigate examinee characteristics associated with response time…

Descriptors: Test Items, Test Format, Reaction Time, Individual Characteristics

Evaluating the Bookmark Standard Setting Method: The Impact of Random Item Ordering

Peer reviewed

Direct link

Davis-Becker, Susan L.; Buckendahl, Chad W.; Gerrow, Jack – International Journal of Testing, 2011

Throughout the world, cut scores are an important aspect of a high-stakes testing program because they are a key operational component of the interpretation of test scores. One method for setting standards that is prevalent in educational testing programs--the Bookmark method--is intended to be a less cognitively complex alternative to methods…

Descriptors: Standard Setting (Scoring), Cutting Scores, Educational Testing, Licensing Examinations (Professions)

Obtaining Content Weights for Test Specifications from Job Analysis Task Surveys: An Application of the Many-Facets Rasch Model

Peer reviewed

Direct link

Wang, Ning; Stahl, John – International Journal of Testing, 2012

This article discusses the use of the Many-Facets Rasch Model, via the FACETS computer program (Linacre, 2006a), to scale job/practice analysis survey data as well as to combine multiple rating scales into single composite weights representing the tasks' relative importance. Results from the Many-Facets Rasch Model are compared with those…

Descriptors: Job Analysis, Surveys, Rating Scales, Scaling

Identifying and Understanding the Effects of Unmotivated Examinees on Test Dimensionality Using Optimal Appropriateness Measurement

Peer reviewed

Direct link

Stark, Stephen; Chernyshenko, Oleksandr S.; Drasgow, Fritz – International Journal of Testing, 2005

Recently, a question was raised as to whether the multidimensionality of some professional licensing exams is due to the administration of subtests measuring conceptually distinct skills or, alternatively, strategic preparation on the part of groups of examinees attempting to cope with the demands of multiple hurdle certification systems. This…

Descriptors: Accounting, Licensing Examinations (Professions), Factor Analysis, Factor Structure

Evaluating the Equivalence of Different Language Versions of a Credentialing Exam.

Peer reviewed

Robin, Frederic; Sireci, Stephen G.; Hambleton, Ronald K. – International Journal of Testing, 2003

Illustrates how multidimensional scaling (MDS) and differential item functioning (DIF) procedures can be used to evaluate the equivalence of different language versions of an examination. Presents examples of structural differences and DIF across languages. (SLD)

Descriptors: Item Bias, Licensing Examinations (Professions), Multidimensional Scaling, Multilingual Materials

Buckendahl, Chad W.	2
Davis-Becker, Susan L.	2
Kim, Sooyeon	2
Chernyshenko, Oleksandr S.	1
Clauser, Brian E.	1
Drasgow, Fritz	1
Gerrow, Jack	1
Haberman, Shelby J.	1
Hambleton, Ronald K.	1
Hess, Brian J.	1
Johnston, Mary M.	1
Larkin, Kevin	1
Lipner, Rebecca S.	1
Margolis, Melissa J.	1
Mee, Janet	1
Moses, Tim	1
Robin, Frederic	1
Sinharay, Sandip	1
Sireci, Stephen G.	1
Stahl, John	1
Stark, Stephen	1
Walker, Michael E.	1
Wang, Ning	1
More ▼