NotesFAQContact Us
Collection
Advanced
Search Tips
Showing 1 to 15 of 23 results Save | Export
Peer reviewed Peer reviewed
Direct linkDirect link
Setzer, J. Carl; Cheng, Ying; Liu, Cheng – Journal of Educational Measurement, 2023
Test scores are often used to make decisions about examinees, such as in licensure and certification testing, as well as in many educational contexts. In some cases, these decisions are based upon compensatory scores, such as those from multiple sections or components of an exam. Classification accuracy and classification consistency are two…
Descriptors: Classification, Accuracy, Psychometrics, Scores
Peer reviewed Peer reviewed
Direct linkDirect link
Jones, Paul; Tong, Ye; Liu, Jinghua; Borglum, Joshua; Primoli, Vince – Journal of Educational Measurement, 2022
This article studied two methods to detect mode effects in two credentialing exams. In Study 1, we used a "modal scale comparison approach," where the same pool of items was calibrated separately, without transformation, within two TC cohorts (TC1 and TC2) and one OP cohort (OP1) matched on their pool-based scale score distributions. The…
Descriptors: Scores, Credentials, Licensing Examinations (Professions), Computer Assisted Testing
Peer reviewed Peer reviewed
Direct linkDirect link
Puhan, Gautam; Kim, Sooyeon – Journal of Educational Measurement, 2022
As a result of the COVID-19 pandemic, at-home testing has become a popular delivery mode in many testing programs. When programs offer at-home testing to expand their service, the score comparability between test takers testing remotely and those testing in a test center is critical. This article summarizes statistical procedures that could be…
Descriptors: Scores, Scoring, Comparative Analysis, Testing
Peer reviewed Peer reviewed
Direct linkDirect link
Harik, Polina; Clauser, Brian E.; Grabovsky, Irina; Baldwin, Peter; Margolis, Melissa J.; Bucak, Deniz; Jodoin, Michael; Walsh, William; Haist, Steven – Journal of Educational Measurement, 2018
Test administrators are appropriately concerned about the potential for time constraints to impact the validity of score interpretations; psychometric efforts to evaluate the impact of speededness date back more than half a century. The widespread move to computerized test delivery has led to the development of new approaches to evaluating how…
Descriptors: Comparative Analysis, Observation, Medical Education, Licensing Examinations (Professions)
Peer reviewed Peer reviewed
Direct linkDirect link
Sinharay, Sandip; Haberman, Shelby J.; Lee, Yi-Hsuan – Journal of Educational Measurement, 2011
Providing information to test takers and test score users about the abilities of test takers at different score levels has been a persistent problem in educational and psychological measurement. Scale anchoring, a technique which describes what students at different points on a score scale know and can do, is a tool to provide such information.…
Descriptors: Scores, Test Items, Statistical Analysis, Licensing Examinations (Professions)
Peer reviewed Peer reviewed
Direct linkDirect link
Raymond, Mark R.; Swygert, Kimberly A.; Kahraman, Nilufer – Journal of Educational Measurement, 2012
Although a few studies report sizable score gains for examinees who repeat performance-based assessments, research has not yet addressed the reliability and validity of inferences based on ratings of repeat examinees on such tests. This study analyzed scores for 8,457 single-take examinees and 4,030 repeat examinees who completed a 6-hour clinical…
Descriptors: Physicians, Licensing Examinations (Professions), Performance Based Assessment, Repetition
Peer reviewed Peer reviewed
Direct linkDirect link
Li, Feiming; Cohen, Allan; Shen, Linjun – Journal of Educational Measurement, 2012
Computer-based tests (CBTs) often use random ordering of items in order to minimize item exposure and reduce the potential for answer copying. Little research has been done, however, to examine item position effects for these tests. In this study, different versions of a Rasch model and different response time models were examined and applied to…
Descriptors: Computer Assisted Testing, Test Items, Item Response Theory, Models
Peer reviewed Peer reviewed
Direct linkDirect link
Clauser, Brian E.; Mee, Janet; Baldwin, Su G.; Margolis, Melissa J.; Dillon, Gerard F. – Journal of Educational Measurement, 2009
Although the Angoff procedure is among the most widely used standard setting procedures for tests comprising multiple-choice items, research has shown that subject matter experts have considerable difficulty accurately making the required judgments in the absence of examinee performance data. Some authors have viewed the need to provide…
Descriptors: Standard Setting (Scoring), Program Effectiveness, Expertise, Health Personnel
Peer reviewed Peer reviewed
Direct linkDirect link
van der Linden, Wim J.; Breithaupt, Krista; Chuah, Siang Chee; Zhang, Yanwei – Journal of Educational Measurement, 2007
A potential undesirable effect of multistage testing is differential speededness, which happens if some of the test takers run out of time because they receive subtests with items that are more time intensive than others. This article shows how a probabilistic response-time model can be used for estimating differences in time intensities and speed…
Descriptors: Adaptive Testing, Evaluation Methods, Test Items, Reaction Time
Peer reviewed Peer reviewed
Norcini, John J.; And Others – Journal of Educational Measurement, 1988
Two studies of medical certification examinations were undertaken to assess standard setting using Angoff's Method. Results indicate that (1) specialization within broad content areas does not affect an expert's estimates of the performance of the borderline group; and (2) performance data should be provided during the standard-setting process.…
Descriptors: Certification, Cutting Scores, Licensing Examinations (Professions), Medicine
Peer reviewed Peer reviewed
Luecht, Richard M.; Nungester, Ronald J. – Journal of Educational Measurement, 1998
Describes an integrated approach to test development and administration called computer-adaptive sequential testing (CAST). CAST incorporates adaptive testing methods with automated test assembly. Describes the CAST framework and demonstrates several applications using a medical-licensure example. (SLD)
Descriptors: Adaptive Testing, Automation, Computer Assisted Testing, Licensing Examinations (Professions)
Peer reviewed Peer reviewed
Meijer, Rob R. – Journal of Educational Measurement, 2002
Used empirical data from a certification test to study methods from statistical process control that have been proposed to classify an item score pattern as fitting or misfitting the underlying item response theory model in computerized adaptive testing. Results for 1,392 examinees show that different types of misfit can be distinguished. (SLD)
Descriptors: Certification, Classification, Goodness of Fit, Item Response Theory
Peer reviewed Peer reviewed
Shannon, Gregory A.; Cliver, Barbara A. – Journal of Educational Measurement, 1987
Spearman correlations were computed between item response theory-derived information functions (IIFs) and four conventional item discrimination indices: phi-coefficient; B-index; phi/phi max; and agreement statistic. Correlations between the phi-coefficient and the IIFs were very high. Data were taken from a real estate licensing test. (Author/GDC)
Descriptors: Adults, Comparative Analysis, Criterion Referenced Tests, Item Analysis
Peer reviewed Peer reviewed
Norcini, John J. – Journal of Educational Measurement, 1987
Answer keys for physician and teacher licensing examinations were studied. The impact of variability on total errors of measurement was examined for answer keys constructed using the aggregate method. Results indicated that, in some cases, scorers contributed to a sizable reduction in measurement error. (Author/GDC)
Descriptors: Adults, Answer Keys, Error of Measurement, Evaluators
Peer reviewed Peer reviewed
Kane, Michael T.; And Others – Journal of Educational Measurement, 1989
This paper develops a multiplicative model as a means of combining ratings of criticality and frequency of various activities involved in job analyses. The model incorporates adjustments to ensure that effective weights of criticality and frequency are appropriate. An example of the model's use is presented. (TJH)
Descriptors: Critical Incidents Method, Higher Education, Job Analysis, Licensing Examinations (Professions)
Previous Page | Next Page ยป
Pages: 1  |  2