ERIC - Search Results

Publication Date

In 2025	0
Since 2024	0
Since 2021 (last 5 years)	3
Since 2016 (last 10 years)	4
Since 2006 (last 20 years)	10

Descriptor

Licensing Examinations…	23
Physicians	7
Scores	7
Test Items	7
Comparative Analysis	5
Computer Assisted Testing	5
Higher Education	5
Standard Setting (Scoring)	5
Certification	4
Evaluators	4
Error of Measurement	3
Evaluation Methods	3
Item Response Theory	3
Models	3
Performance Based Assessment	3
Psychometrics	3
Scoring	3
Statistical Analysis	3
Test Format	3
Adaptive Testing	2
Adults	2
Classification	2
Cutting Scores	2
Data Analysis	2
Decision Making	2
More ▼

Source

Journal of Educational…

Publication Type

Journal Articles	23
Reports - Research	15
Reports - Evaluative	6
Reports - Descriptive	2

Education Level

Audience

Location

United States

Laws, Policies, & Programs

Assessments and Surveys

United States Medical…	2
National Teacher Examinations	1

What Works Clearinghouse Rating

Showing 1 to 15 of 23 results Save | Export

Classification Accuracy and Consistency of Compensatory Composite Test Scores

Peer reviewed

Direct link

Setzer, J. Carl; Cheng, Ying; Liu, Cheng – Journal of Educational Measurement, 2023

Test scores are often used to make decisions about examinees, such as in licensure and certification testing, as well as in many educational contexts. In some cases, these decisions are based upon compensatory scores, such as those from multiple sections or components of an exam. Classification accuracy and classification consistency are two…

Descriptors: Classification, Accuracy, Psychometrics, Scores

Score Comparability between Online Proctored and In-Person Credentialing Exams

Peer reviewed

Direct link

Jones, Paul; Tong, Ye; Liu, Jinghua; Borglum, Joshua; Primoli, Vince – Journal of Educational Measurement, 2022

This article studied two methods to detect mode effects in two credentialing exams. In Study 1, we used a "modal scale comparison approach," where the same pool of items was calibrated separately, without transformation, within two TC cohorts (TC1 and TC2) and one OP cohort (OP1) matched on their pool-based scale score distributions. The…

Descriptors: Scores, Credentials, Licensing Examinations (Professions), Computer Assisted Testing

Score Comparability Issues with At-Home Testing and How to Address Them

Peer reviewed

Direct link

Puhan, Gautam; Kim, Sooyeon – Journal of Educational Measurement, 2022

As a result of the COVID-19 pandemic, at-home testing has become a popular delivery mode in many testing programs. When programs offer at-home testing to expand their service, the score comparability between test takers testing remotely and those testing in a test center is critical. This article summarizes statistical procedures that could be…

Descriptors: Scores, Scoring, Comparative Analysis, Testing

A Comparison of Experimental and Observational Approaches to Assessing the Effects of Time Constraints in a Medical Licensing Examination

Peer reviewed

Direct link

Harik, Polina; Clauser, Brian E.; Grabovsky, Irina; Baldwin, Peter; Margolis, Melissa J.; Bucak, Deniz; Jodoin, Michael; Walsh, William; Haist, Steven – Journal of Educational Measurement, 2018

Test administrators are appropriately concerned about the potential for time constraints to impact the validity of score interpretations; psychometric efforts to evaluate the impact of speededness date back more than half a century. The widespread move to computerized test delivery has led to the development of new approaches to evaluating how…

Descriptors: Comparative Analysis, Observation, Medical Education, Licensing Examinations (Professions)

When Does Scale Anchoring Work? A Case Study

Peer reviewed

Direct link

Sinharay, Sandip; Haberman, Shelby J.; Lee, Yi-Hsuan – Journal of Educational Measurement, 2011

Providing information to test takers and test score users about the abilities of test takers at different score levels has been a persistent problem in educational and psychological measurement. Scale anchoring, a technique which describes what students at different points on a score scale know and can do, is a tool to provide such information.…

Descriptors: Scores, Test Items, Statistical Analysis, Licensing Examinations (Professions)

Psychometric Equivalence of Ratings for Repeat Examinees on a Performance Assessment for Physician Licensure

Peer reviewed

Direct link

Raymond, Mark R.; Swygert, Kimberly A.; Kahraman, Nilufer – Journal of Educational Measurement, 2012

Although a few studies report sizable score gains for examinees who repeat performance-based assessments, research has not yet addressed the reliability and validity of inferences based on ratings of repeat examinees on such tests. This study analyzed scores for 8,457 single-take examinees and 4,030 repeat examinees who completed a 6-hour clinical…

Descriptors: Physicians, Licensing Examinations (Professions), Performance Based Assessment, Repetition

Investigating the Effect of Item Position in Computer-Based Tests

Peer reviewed

Direct link

Li, Feiming; Cohen, Allan; Shen, Linjun – Journal of Educational Measurement, 2012

Computer-based tests (CBTs) often use random ordering of items in order to minimize item exposure and reduce the potential for answer copying. Little research has been done, however, to examine item position effects for these tests. In this study, different versions of a Rasch model and different response time models were examined and applied to…

Descriptors: Computer Assisted Testing, Test Items, Item Response Theory, Models

Judges' Use of Examinee Performance Data in an Angoff Standard-Setting Exercise for a Medical Licensing Examination: An Experimental Study

Peer reviewed

Direct link

Clauser, Brian E.; Mee, Janet; Baldwin, Su G.; Margolis, Melissa J.; Dillon, Gerard F. – Journal of Educational Measurement, 2009

Although the Angoff procedure is among the most widely used standard setting procedures for tests comprising multiple-choice items, research has shown that subject matter experts have considerable difficulty accurately making the required judgments in the absence of examinee performance data. Some authors have viewed the need to provide…

Descriptors: Standard Setting (Scoring), Program Effectiveness, Expertise, Health Personnel

Detecting Differential Speededness in Multistage Testing

Peer reviewed

Direct link

van der Linden, Wim J.; Breithaupt, Krista; Chuah, Siang Chee; Zhang, Yanwei – Journal of Educational Measurement, 2007

A potential undesirable effect of multistage testing is differential speededness, which happens if some of the test takers run out of time because they receive subtests with items that are more time intensive than others. This article shows how a probabilistic response-time model can be used for estimating differences in time intensities and speed…

Descriptors: Adaptive Testing, Evaluation Methods, Test Items, Reaction Time

The Effect of Various Factors on Standard Setting.

Peer reviewed

Norcini, John J.; And Others – Journal of Educational Measurement, 1988

Two studies of medical certification examinations were undertaken to assess standard setting using Angoff's Method. Results indicate that (1) specialization within broad content areas does not affect an expert's estimates of the performance of the borderline group; and (2) performance data should be provided during the standard-setting process.…

Descriptors: Certification, Cutting Scores, Licensing Examinations (Professions), Medicine

Some Practical Examples of Computer-Adaptive Sequential Testing.

Peer reviewed

Luecht, Richard M.; Nungester, Ronald J. – Journal of Educational Measurement, 1998

Describes an integrated approach to test development and administration called computer-adaptive sequential testing (CAST). CAST incorporates adaptive testing methods with automated test assembly. Describes the CAST framework and demonstrates several applications using a medical-licensure example. (SLD)

Descriptors: Adaptive Testing, Automation, Computer Assisted Testing, Licensing Examinations (Professions)

Outlier Detection in High-Stakes Certification Testing.

Peer reviewed

Meijer, Rob R. – Journal of Educational Measurement, 2002

Used empirical data from a certification test to study methods from statistical process control that have been proposed to classify an item score pattern as fitting or misfitting the underlying item response theory model in computerized adaptive testing. Results for 1,392 examinees show that different types of misfit can be distinguished. (SLD)

Descriptors: Certification, Classification, Goodness of Fit, Item Response Theory

An Application of Item Response Theory in the Comparison of Four Conventional Item Discrimination Indices for Criterion-Referenced Tests.

Peer reviewed

Shannon, Gregory A.; Cliver, Barbara A. – Journal of Educational Measurement, 1987

Spearman correlations were computed between item response theory-derived information functions (IIFs) and four conventional item discrimination indices: phi-coefficient; B-index; phi/phi max; and agreement statistic. Correlations between the phi-coefficient and the IIFs were very high. Data were taken from a real estate licensing test. (Author/GDC)

Descriptors: Adults, Comparative Analysis, Criterion Referenced Tests, Item Analysis

The Answer Key as a Source of Error in Examinations for Professionals.

Peer reviewed

Norcini, John J. – Journal of Educational Measurement, 1987

Answer keys for physician and teacher licensing examinations were studied. The impact of variability on total errors of measurement was examined for answer keys constructed using the aggregate method. Results indicated that, in some cases, scorers contributed to a sizable reduction in measurement error. (Author/GDC)

Descriptors: Adults, Answer Keys, Error of Measurement, Evaluators

Combining Data on Criticality and Frequency in Developing Test Plans for Licensure and Certification Examinations.

Peer reviewed

Kane, Michael T.; And Others – Journal of Educational Measurement, 1989

This paper develops a multiplicative model as a means of combining ratings of criticality and frequency of various activities involved in job analyses. The model incorporates adjustments to ensure that effective weights of criticality and frequency are appropriate. An example of the model's use is presented. (TJH)

Descriptors: Critical Incidents Method, Higher Education, Job Analysis, Licensing Examinations (Professions)

Previous Page | Next Page »

Pages: 1 | 2

Norcini, John J.	4
Clauser, Brian E.	3
Margolis, Melissa J.	3
Harik, Polina	2
Raymond, Mark R.	2
Baldwin, Peter	1
Baldwin, Su G.	1
Borglum, Joshua	1
Breithaupt, Krista	1
Bridgeman, Brent	1
Bucak, Deniz	1
Busch, John Christian	1
Cahalan, Cara	1
Cheng, Ying	1
Chuah, Siang Chee	1
Cliver, Barbara A.	1
Cohen, Allan	1
Colton, Dean A.	1
Dillon, Gerard F.	1
Fitzpatrick, Anne R.	1
Gallagher, Ann	1
Grabovsky, Irina	1
Haberman, Shelby J.	1
Haist, Steven	1
More ▼