NotesFAQContact Us
Collection
Advanced
Search Tips
Showing 1 to 15 of 18 results Save | Export
Peer reviewed Peer reviewed
Direct linkDirect link
Betts, Joe; Muntean, William; Kim, Doyoung; Kao, Shu-chuan – Educational and Psychological Measurement, 2022
The multiple response structure can underlie several different technology-enhanced item types. With the increased use of computer-based testing, multiple response items are becoming more common. This response type holds the potential for being scored polytomously for partial credit. However, there are several possible methods for computing raw…
Descriptors: Scoring, Test Items, Test Format, Raw Scores
Peer reviewed Peer reviewed
Direct linkDirect link
Sinharay, Sandip – Educational and Psychological Measurement, 2022
Administrative problems such as computer malfunction and power outage occasionally lead to missing item scores and hence to incomplete data on mastery tests such as the AP and U.S. Medical Licensing examinations. Investigators are often interested in estimating the probabilities of passing of the examinees with incomplete data on mastery tests.…
Descriptors: Mastery Tests, Computer Assisted Testing, Probability, Test Wiseness
Peer reviewed Peer reviewed
Direct linkDirect link
Lee, Chansoon; Qian, Hong – Educational and Psychological Measurement, 2022
Using classical test theory and item response theory, this study applied sequential procedures to a real operational item pool in a variable-length computerized adaptive testing (CAT) to detect items whose security may be compromised. Moreover, this study proposed a hybrid threshold approach to improve the detection power of the sequential…
Descriptors: Computer Assisted Testing, Adaptive Testing, Licensing Examinations (Professions), Item Response Theory
Peer reviewed Peer reviewed
Direct linkDirect link
Bezirhan, Ummugul; von Davier, Matthias; Grabovsky, Irina – Educational and Psychological Measurement, 2021
This article presents a new approach to the analysis of how students answer tests and how they allocate resources in terms of time on task and revisiting previously answered questions. Previous research has shown that in high-stakes assessments, most test takers do not end the testing session early, but rather spend all of the time they were…
Descriptors: Response Style (Tests), Accuracy, Reaction Time, Ability
Peer reviewed Peer reviewed
Direct linkDirect link
Sideridis, Georgios D.; Tsaousis, Ioannis; Al-Sadaawi, Abdullah – Educational and Psychological Measurement, 2019
The purpose of the present study was to apply the methodology developed by Raykov on modeling item-specific variance for the measurement of internal consistency reliability with longitudinal data. Participants were a randomly selected sample of 500 individuals who took on a professional qualifications test in Saudi Arabia over four different…
Descriptors: Test Reliability, Test Items, Longitudinal Studies, Foreign Countries
Peer reviewed Peer reviewed
Direct linkDirect link
Himelfarb, Igor; Marcoulides, Katerina M.; Fang, Guoliang; Shotts, Bruce L. – Educational and Psychological Measurement, 2020
The chiropractic clinical competency examination uses groups of items that are integrated by a common case vignette. The nature of the vignette items violates the assumption of local independence for items nested within a vignette. This study examines via simulation a new algorithmic approach for addressing the local independence violation problem…
Descriptors: Allied Health Occupations Education, Allied Health Personnel, Competence, Tests
Peer reviewed Peer reviewed
Direct linkDirect link
Wyse, Adam E.; Babcock, Ben – Educational and Psychological Measurement, 2016
Continuously administered examination programs, particularly credentialing programs that require graduation from educational programs, often experience seasonality where distributions of examine ability may differ over time. Such seasonality may affect the quality of important statistical processes, such as item response theory (IRT) item…
Descriptors: Test Items, Item Response Theory, Computation, Licensing Examinations (Professions)
Peer reviewed Peer reviewed
Direct linkDirect link
Clauser, Jerome C.; Hambleton, Ronald K.; Baldwin, Peter – Educational and Psychological Measurement, 2017
The Angoff standard setting method relies on content experts to review exam items and make judgments about the performance of the minimally proficient examinee. Unfortunately, at times content experts may have gaps in their understanding of specific exam content. These gaps are particularly likely to occur when the content domain is broad and/or…
Descriptors: Scores, Item Analysis, Classification, Decision Making
Peer reviewed Peer reviewed
Direct linkDirect link
He, Wei; Reckase, Mark D. – Educational and Psychological Measurement, 2014
For computerized adaptive tests (CATs) to work well, they must have an item pool with sufficient numbers of good quality items. Many researchers have pointed out that, in developing item pools for CATs, not only is the item pool size important but also the distribution of item parameters and practical considerations such as content distribution…
Descriptors: Item Banks, Test Length, Computer Assisted Testing, Adaptive Testing
Peer reviewed Peer reviewed
Schmidt, Amy Elizabeth – Educational and Psychological Measurement, 2000
Conducted a validity study to examine the degree to which scores on the newly developed Diagnostic Readiness Test (DRT) and National League for Nursing Pre-Admission Test scores could predict success or failure on the National Council Licensure Examination for Registered Nurses (NCLEX-RN). Results for 5,698 students indicate that the DRT is a…
Descriptors: Licensing Examinations (Professions), Nurses, Prediction, Readiness
Peer reviewed Peer reviewed
Harris, Deborah J.; Kolen, Michael J. – Educational and Psychological Measurement, 1990
An Angoff method and a frequency estimation equipercentile equating method were compared, using data from three forms of a 200-item multiple-choice certification test. Data requirements are fewer and computational requirements less burdensome for the former than for the latter method. However, results of the two methods are not interchangeable.…
Descriptors: Comparative Analysis, Computation, Equated Scores, Licensing Examinations (Professions)
Peer reviewed Peer reviewed
Hurtz, Gregory M.; Hertz, Norman R. – Educational and Psychological Measurement, 1999
Evaluated Angoff ratings from eight different occupational licensing examinations through generalizability theory to estimate the optimal number of raters. Results indicate that approximately 10 to 15 raters is an optimal target range. (SLD)
Descriptors: Cutting Scores, Evaluators, Generalizability Theory, Interrater Reliability
Peer reviewed Peer reviewed
Grosse, Martin E.; Wright, Benjamin D. – Educational and Psychological Measurement, 1988
Psychometric characteristics of the test scores of 5,663 examinees on six patient management problem tests were studied. The total scores can be divided into subscores based on all options keyed "select" and "omit." Component scores are generally correlated negatively, as reflected in reduced discrimination indices and…
Descriptors: Case Studies, Licensing Examinations (Professions), Medical Evaluation, Problem Solving
Peer reviewed Peer reviewed
Direct linkDirect link
Yin, Ping – Educational and Psychological Measurement, 2005
The main purpose of this study is to examine the content structure of the Multistate Bar Examination (MBE) using the "table of specifications" model from the perspective of multivariate generalizability theory. Specifically, using MBE data collected over different years (six administrations: three from the February test and three from July test),…
Descriptors: Correlation, Generalizability Theory, Statistical Analysis, Multivariate Analysis
Peer reviewed Peer reviewed
Direct linkDirect link
Stone, Clement A.; Yeh, Chien-Chi – Educational and Psychological Measurement, 2006
Examination of a test's internal structure can be used to identify what domains or dimensions are being measured, identify relationships between the dimensions, provide evidence for hypothesized multidimensionality and test score interpretations, and identify construct-irrelevant variance. The purpose of this research is to provide a…
Descriptors: Multiple Choice Tests, Factor Structure, Factor Analysis, Licensing Examinations (Professions)
Previous Page | Next Page ยป
Pages: 1  |  2