Publication Date
In 2025 | 0 |
Since 2024 | 0 |
Since 2021 (last 5 years) | 3 |
Since 2016 (last 10 years) | 4 |
Since 2006 (last 20 years) | 12 |
Descriptor
Licensing Examinations… | 12 |
Test Items | 7 |
Scores | 6 |
Statistical Analysis | 6 |
Comparative Analysis | 4 |
Item Response Theory | 4 |
Accuracy | 3 |
Prediction | 3 |
Robustness (Statistics) | 3 |
Test Theory | 3 |
Achievement Tests | 2 |
More ▼ |
Source
Author
Sinharay, Sandip | 12 |
Haberman, Shelby J. | 4 |
Holland, Paul W. | 2 |
Lee, Yi-Hsuan | 2 |
Blew, Edwin O. | 1 |
Dorans, Neil J. | 1 |
Grant, Mary C. | 1 |
Haberman, Shelby | 1 |
Han, Ning | 1 |
Johnson, Matthew S. | 1 |
Knorr, Colleen M. | 1 |
More ▼ |
Publication Type
Journal Articles | 7 |
Reports - Research | 7 |
Reports - Evaluative | 4 |
Reports - Descriptive | 1 |
Education Level
Elementary Education | 1 |
Audience
Location
Laws, Policies, & Programs
Assessments and Surveys
Pre Professional Skills Tests | 1 |
What Works Clearinghouse Rating
Sinharay, Sandip – Educational Measurement: Issues and Practice, 2022
Administrative problems such as computer malfunction and power outage occasionally lead to missing item scores, and hence to incomplete data, on credentialing tests such as the United States Medical Licensing examination. Feinberg compared four approaches for reporting pass-fail decisions to the examinees with incomplete data on credentialing…
Descriptors: Testing Problems, High Stakes Tests, Credentials, Test Items
Sinharay, Sandip – Grantee Submission, 2021
Drasgow, Levine, and Zickar (1996) suggested a statistic based on the Neyman-Pearson lemma (e.g., Lehmann & Romano, 2005, p. 60) for detecting preknowledge on a known set of items. The statistic is a special case of the optimal appropriateness indices of Levine and Drasgow (1988) and is the most powerful statistic for detecting item…
Descriptors: Robustness (Statistics), Hypothesis Testing, Statistics, Test Items
Sinharay, Sandip – Educational and Psychological Measurement, 2022
Administrative problems such as computer malfunction and power outage occasionally lead to missing item scores and hence to incomplete data on mastery tests such as the AP and U.S. Medical Licensing examinations. Investigators are often interested in estimating the probabilities of passing of the examinees with incomplete data on mastery tests.…
Descriptors: Mastery Tests, Computer Assisted Testing, Probability, Test Wiseness
Sinharay, Sandip; Johnson, Matthew S. – Grantee Submission, 2019
According to Wollack and Schoenig (2018), benefitting from item preknowledge is one of the three broad types of test fraud that occur in educational assessments. We use tools from constrained statistical inference to suggest a new statistic that is based on item scores and response times and can be used to detect the examinees who may have…
Descriptors: Scores, Test Items, Reaction Time, Cheating
Sinharay, Sandip; Haberman, Shelby J. – International Journal of Testing, 2014
Recently there has been an increasing level of interest in subtest scores, or subscores, for their potential diagnostic value. Haberman (2008) suggested a method to determine if a subscore has added value over the total score. Researchers have often been interested in the performance of subgroups--for example, those based on gender or…
Descriptors: Scores, Achievement Tests, Language Tests, English (Second Language)
Sinharay, Sandip; Haberman, Shelby J.; Lee, Yi-Hsuan – Journal of Educational Measurement, 2011
Providing information to test takers and test score users about the abilities of test takers at different score levels has been a persistent problem in educational and psychological measurement. Scale anchoring, a technique which describes what students at different points on a score scale know and can do, is a tool to provide such information.…
Descriptors: Scores, Test Items, Statistical Analysis, Licensing Examinations (Professions)
Haberman, Shelby J.; Sinharay, Sandip; Lee, Yi-Hsuan – Educational Testing Service, 2011
Providing information to test takers and test score users about the abilities of test takers at different score levels has been a persistent problem in educational and psychological measurement (Carroll, 1993). Scale anchoring (Beaton & Allen, 1992), a technique that describes what students at different points on a score scale know and can do,…
Descriptors: Statistical Analysis, Scores, Regression (Statistics), Item Response Theory
Haberman, Shelby J.; Sinharay, Sandip – Educational Testing Service, 2011
Subscores are reported for several operational assessments. Haberman (2008) suggested a method based on classical test theory to determine if the true subscore is predicted better by the corresponding subscore or the total score. Researchers are often interested in learning how different subgroups perform on subtests. Stricker (1993) and…
Descriptors: True Scores, Test Theory, Prediction, Group Membership
Puhan, Gautam; Sinharay, Sandip; Haberman, Shelby; Larkin, Kevin – Applied Measurement in Education, 2010
Will subscores provide additional information than what is provided by the total score? Is there a method that can estimate more trustworthy subscores than observed subscores? To answer the first question, this study evaluated whether the true subscore was more accurately predicted by the observed subscore or total score. To answer the second…
Descriptors: Licensing Examinations (Professions), Scores, Computation, Methods
Sinharay, Sandip; Holland, Paul W. – Educational Testing Service, 2008
The nonequivalent groups with anchor test (NEAT) design involves missing data that are missing by design. Three popular equating methods that can be used with a NEAT design are the poststratification equating method, the chain equipercentile equating method, and the item-response-theory observed-score-equating method. These three methods each…
Descriptors: Equated Scores, Test Items, Item Response Theory, Data
Sinharay, Sandip; Dorans, Neil J.; Grant, Mary C.; Blew, Edwin O.; Knorr, Colleen M. – ETS Research Report Series, 2006
The application of the Mantel-Haenszel test statistic (and other popular DIF-detection methods) to determine DIF requires large samples, but test administrators often need to detect DIF with small samples. There is no universally agreed upon statistical approach for performing DIF analysis with small samples; hence there is substantial scope of…
Descriptors: Test Bias, Computation, Sample Size, Bayesian Statistics
Holland, Paul W.; von Davier, Alina A.; Sinharay, Sandip; Han, Ning – ETS Research Report Series, 2006
This paper focuses on the Non-Equivalent Groups with Anchor Test (NEAT) design for test equating and on two classes of observed--score equating (OSE) methods--chain equating (CE) and poststratification equating (PSE). These two classes of methods reflect two distinctly different ways of using the information provided by the anchor test for…
Descriptors: Equated Scores, Test Items, Statistical Analysis, Comparative Analysis