ERIC - Search Results

Publication Date

In 2025	0
Since 2024	0
Since 2021 (last 5 years)	4
Since 2016 (last 10 years)	8
Since 2006 (last 20 years)	10

Descriptor

Licensing Examinations…	18
Test Items	8
Item Response Theory	6
Computer Assisted Testing	4
Difficulty Level	4
Multiple Choice Tests	4
Physicians	4
Scores	4
Accuracy	3
Adaptive Testing	3
Bayesian Statistics	3
Comparative Analysis	3
Computation	3
Correlation	3
Test Construction	3
Ability	2
Classification	2
Equated Scores	2
Evaluators	2
Generalizability Theory	2
Item Banks	2
Maximum Likelihood Statistics	2
Monte Carlo Methods	2
Simulation	2
Statistical Analysis	2
More ▼

Source

Educational and Psychological…

Publication Type

Journal Articles	18
Reports - Research	13
Reports - Evaluative	4
Speeches/Meeting Papers	3
Reports - Descriptive	1

Education Level

Higher Education

Audience

Location

Indiana	1
Saudi Arabia	1

Laws, Policies, & Programs

Assessments and Surveys

United States Medical…

What Works Clearinghouse Rating

Showing 1 to 15 of 18 results Save | Export

Evaluating Different Scoring Methods for Multiple Response Items Providing Partial Credit

Peer reviewed

Direct link

Betts, Joe; Muntean, William; Kim, Doyoung; Kao, Shu-chuan – Educational and Psychological Measurement, 2022

The multiple response structure can underlie several different technology-enhanced item types. With the increased use of computer-based testing, multiple response items are becoming more common. This response type holds the potential for being scored polytomously for partial credit. However, there are several possible methods for computing raw…

Descriptors: Scoring, Test Items, Test Format, Raw Scores

Estimating Probabilities of Passing for Examinees with Incomplete Data in Mastery Tests

Peer reviewed

Direct link

Sinharay, Sandip – Educational and Psychological Measurement, 2022

Administrative problems such as computer malfunction and power outage occasionally lead to missing item scores and hence to incomplete data on mastery tests such as the AP and U.S. Medical Licensing examinations. Investigators are often interested in estimating the probabilities of passing of the examinees with incomplete data on mastery tests.…

Descriptors: Mastery Tests, Computer Assisted Testing, Probability, Test Wiseness

Hybrid Threshold-Based Sequential Procedures for Detecting Compromised Items in a Computerized Adaptive Testing Licensure Exam

Peer reviewed

Direct link

Lee, Chansoon; Qian, Hong – Educational and Psychological Measurement, 2022

Using classical test theory and item response theory, this study applied sequential procedures to a real operational item pool in a variable-length computerized adaptive testing (CAT) to detect items whose security may be compromised. Moreover, this study proposed a hybrid threshold approach to improve the detection power of the sequential…

Descriptors: Computer Assisted Testing, Adaptive Testing, Licensing Examinations (Professions), Item Response Theory

Modeling Item Revisit Behavior: The Hierarchical Speed-Accuracy-Revisits Model

Peer reviewed

Direct link

Bezirhan, Ummugul; von Davier, Matthias; Grabovsky, Irina – Educational and Psychological Measurement, 2021

This article presents a new approach to the analysis of how students answer tests and how they allocate resources in terms of time on task and revisiting previously answered questions. Previous research has shown that in high-stakes assessments, most test takers do not end the testing session early, but rather spend all of the time they were…

Descriptors: Response Style (Tests), Accuracy, Reaction Time, Ability

An Application of Reliability Estimation in Longitudinal Designs through Modeling Item-Specific Error Variance

Peer reviewed

Direct link

Sideridis, Georgios D.; Tsaousis, Ioannis; Al-Sadaawi, Abdullah – Educational and Psychological Measurement, 2019

The purpose of the present study was to apply the methodology developed by Raykov on modeling item-specific variance for the measurement of internal consistency reliability with longitudinal data. Participants were a randomly selected sample of 500 individuals who took on a professional qualifications test in Saudi Arabia over four different…

Descriptors: Test Reliability, Test Items, Longitudinal Studies, Foreign Countries

A Two-Level Alternating Direction Model for Polytomous Items with Local Dependence

Peer reviewed

Direct link

Himelfarb, Igor; Marcoulides, Katerina M.; Fang, Guoliang; Shotts, Bruce L. – Educational and Psychological Measurement, 2020

The chiropractic clinical competency examination uses groups of items that are integrated by a common case vignette. The nature of the vignette items violates the assumption of local independence for items nested within a vignette. This study examines via simulation a new algorithmic approach for addressing the local independence violation problem…

Descriptors: Allied Health Occupations Education, Allied Health Personnel, Competence, Tests

How Does Calibration Timing and Seasonality Affect Item Parameter Estimates?

Peer reviewed

Direct link

Wyse, Adam E.; Babcock, Ben – Educational and Psychological Measurement, 2016

Continuously administered examination programs, particularly credentialing programs that require graduation from educational programs, often experience seasonality where distributions of examine ability may differ over time. Such seasonality may affect the quality of important statistical processes, such as item response theory (IRT) item…

Descriptors: Test Items, Item Response Theory, Computation, Licensing Examinations (Professions)

The Effect of Rating Unfamiliar Items on Angoff Passing Scores

Peer reviewed

Direct link

Clauser, Jerome C.; Hambleton, Ronald K.; Baldwin, Peter – Educational and Psychological Measurement, 2017

The Angoff standard setting method relies on content experts to review exam items and make judgments about the performance of the minimally proficient examinee. Unfortunately, at times content experts may have gaps in their understanding of specific exam content. These gaps are particularly likely to occur when the content domain is broad and/or…

Descriptors: Scores, Item Analysis, Classification, Decision Making

Item Pool Design for an Operational Variable-Length Computerized Adaptive Test

Peer reviewed

Direct link

He, Wei; Reckase, Mark D. – Educational and Psychological Measurement, 2014

For computerized adaptive tests (CATs) to work well, they must have an item pool with sufficient numbers of good quality items. Many researchers have pointed out that, in developing item pools for CATs, not only is the item pool size important but also the distribution of item parameters and practical considerations such as content distribution…

Descriptors: Item Banks, Test Length, Computer Assisted Testing, Adaptive Testing

An Approximation of a Hierarchical Logistic Regression Model Used To Establish the Predictive Validity of Scores on a Nursing Licensure Exam.

Peer reviewed

Schmidt, Amy Elizabeth – Educational and Psychological Measurement, 2000

Conducted a validity study to examine the degree to which scores on the newly developed Diagnostic Readiness Test (DRT) and National League for Nursing Pre-Admission Test scores could predict success or failure on the National Council Licensure Examination for Registered Nurses (NCLEX-RN). Results for 5,698 students indicate that the DRT is a…

Descriptors: Licensing Examinations (Professions), Nurses, Prediction, Readiness

A Comparison of Two Equipercentile Equating Methods for Common Item Equating.

Peer reviewed

Harris, Deborah J.; Kolen, Michael J. – Educational and Psychological Measurement, 1990

An Angoff method and a frequency estimation equipercentile equating method were compared, using data from three forms of a 200-item multiple-choice certification test. Data requirements are fewer and computational requirements less burdensome for the former than for the latter method. However, results of the two methods are not interchangeable.…

Descriptors: Comparative Analysis, Computation, Equated Scores, Licensing Examinations (Professions)

How Many Raters Should Be Used for Establishing Cutoff Scores with the Angoff Method? A Generalizability Theory Study.

Peer reviewed

Hurtz, Gregory M.; Hertz, Norman R. – Educational and Psychological Measurement, 1999

Evaluated Angoff ratings from eight different occupational licensing examinations through generalizability theory to estimate the optimal number of raters. Results indicate that approximately 10 to 15 raters is an optimal target range. (SLD)

Descriptors: Cutting Scores, Evaluators, Generalizability Theory, Interrater Reliability

Psychometric Characteristics of Scores on a Patient Management Problem Test.

Peer reviewed

Grosse, Martin E.; Wright, Benjamin D. – Educational and Psychological Measurement, 1988

Psychometric characteristics of the test scores of 5,663 examinees on six patient management problem tests were studied. The total scores can be divided into subscores based on all options keyed "select" and "omit." Component scores are generally correlated negatively, as reflected in reduced discrimination indices and…

Descriptors: Case Studies, Licensing Examinations (Professions), Medical Evaluation, Problem Solving

A Multivariate Generalizability Analysis of the Multistate Bar Examination

Peer reviewed

Direct link

Yin, Ping – Educational and Psychological Measurement, 2005

The main purpose of this study is to examine the content structure of the Multistate Bar Examination (MBE) using the "table of specifications" model from the perspective of multivariate generalizability theory. Specifically, using MBE data collected over different years (six administrations: three from the February test and three from July test),…

Descriptors: Correlation, Generalizability Theory, Statistical Analysis, Multivariate Analysis

Assessing the Dimensionality and Factor Structure of Multiple-Choice Exams: An Empirical Comparison of Methods Using the Multistate Bar Examination

Peer reviewed

Direct link

Stone, Clement A.; Yeh, Chien-Chi – Educational and Psychological Measurement, 2006

Examination of a test's internal structure can be used to identify what domains or dimensions are being measured, identify relationships between the dimensions, provide evidence for hypothesized multidimensionality and test score interpretations, and identify construct-irrelevant variance. The purpose of this research is to provide a…

Descriptors: Multiple Choice Tests, Factor Structure, Factor Analysis, Licensing Examinations (Professions)

Previous Page | Next Page »

Pages: 1 | 2

Cizek, Gregory J.	2
Al-Sadaawi, Abdullah	1
Babcock, Ben	1
Baldwin, Peter	1
Betts, Joe	1
Bezirhan, Ummugul	1
Clauser, Jerome C.	1
Fang, Guoliang	1
Grabovsky, Irina	1
Grosse, Martin E.	1
Hambleton, Ronald K.	1
Harris, Deborah J.	1
He, Wei	1
Hertz, Norman R.	1
Himelfarb, Igor	1
Hurtz, Gregory M.	1
Kao, Shu-chuan	1
Kim, Doyoung	1
Kolen, Michael J.	1
Lee, Chansoon	1
Marcoulides, Katerina M.	1
Moore, Don	1
Muntean, William	1
O'Day, Dennis M.	1
More ▼