ERIC - Search Results

Publication Date

In 2025	0
Since 2024	0
Since 2021 (last 5 years)	3
Since 2016 (last 10 years)	4
Since 2006 (last 20 years)	12

Descriptor

Licensing Examinations…	12
Test Items	7
Scores	6
Statistical Analysis	6
Comparative Analysis	4
Item Response Theory	4
Accuracy	3
Prediction	3
Robustness (Statistics)	3
Test Theory	3
Achievement Tests	2
Cheating	2
Computation	2
Correlation	2
Difficulty Level	2
Equated Scores	2
Language Tests	2
Methods	2
Multiple Choice Tests	2
Reliability	2
Teacher Certification	2
Testing Problems	2
Advanced Placement	1
Bayesian Statistics	1
Classification	1
More ▼

Source

Educational Testing Service	3
ETS Research Report Series	2
Grantee Submission	2
Applied Measurement in…	1
Educational Measurement:…	1
Educational and Psychological…	1
International Journal of…	1
Journal of Educational…	1

Author

Sinharay, Sandip	12
Haberman, Shelby J.	4
Holland, Paul W.	2
Lee, Yi-Hsuan	2
Blew, Edwin O.	1
Dorans, Neil J.	1
Grant, Mary C.	1
Haberman, Shelby	1
Han, Ning	1
Johnson, Matthew S.	1
Knorr, Colleen M.	1
Larkin, Kevin	1
Puhan, Gautam	1
von Davier, Alina A.	1
More ▼

Publication Type

Journal Articles	7
Reports - Research	7
Reports - Evaluative	4
Reports - Descriptive	1

Education Level

Elementary Education

Audience

Location

Laws, Policies, & Programs

Assessments and Surveys

Pre Professional Skills Tests

What Works Clearinghouse Rating

Showing all 12 results Save | Export

Reporting Pass-Fail Decisions to Examinees with Incomplete Data: A Commentary on Feinberg (2021)

Peer reviewed

Direct link

Sinharay, Sandip – Educational Measurement: Issues and Practice, 2022

Administrative problems such as computer malfunction and power outage occasionally lead to missing item scores, and hence to incomplete data, on credentialing tests such as the United States Medical Licensing examination. Feinberg compared four approaches for reporting pass-fail decisions to the examinees with incomplete data on credentialing…

Descriptors: Testing Problems, High Stakes Tests, Credentials, Test Items

The Lack of Robustness of a Statistic Based on the Neyman-Pearson Lemma to Violations of Its Underlying Assumptions

Peer reviewed
PDF on ERIC

Download full text

Direct link

Sinharay, Sandip – Grantee Submission, 2021

Drasgow, Levine, and Zickar (1996) suggested a statistic based on the Neyman-Pearson lemma (e.g., Lehmann & Romano, 2005, p. 60) for detecting preknowledge on a known set of items. The statistic is a special case of the optimal appropriateness indices of Levine and Drasgow (1988) and is the most powerful statistic for detecting item…

Descriptors: Robustness (Statistics), Hypothesis Testing, Statistics, Test Items

Estimating Probabilities of Passing for Examinees with Incomplete Data in Mastery Tests

Peer reviewed

Direct link

Sinharay, Sandip – Educational and Psychological Measurement, 2022

Administrative problems such as computer malfunction and power outage occasionally lead to missing item scores and hence to incomplete data on mastery tests such as the AP and U.S. Medical Licensing examinations. Investigators are often interested in estimating the probabilities of passing of the examinees with incomplete data on mastery tests.…

Descriptors: Mastery Tests, Computer Assisted Testing, Probability, Test Wiseness

The Use of Item Scores and Response Times to Detect Examinees Who May Have Benefited from Item Preknowledge

Peer reviewed
PDF on ERIC

Download full text

Direct link

Sinharay, Sandip; Johnson, Matthew S. – Grantee Submission, 2019

According to Wollack and Schoenig (2018), benefitting from item preknowledge is one of the three broad types of test fraud that occur in educational assessments. We use tools from constrained statistical inference to suggest a new statistic that is based on item scores and response times and can be used to detect the examinees who may have…

Descriptors: Scores, Test Items, Reaction Time, Cheating

An Empirical Investigation of Population Invariance in the Value of Subscores

Peer reviewed

Direct link

Sinharay, Sandip; Haberman, Shelby J. – International Journal of Testing, 2014

Recently there has been an increasing level of interest in subtest scores, or subscores, for their potential diagnostic value. Haberman (2008) suggested a method to determine if a subscore has added value over the total score. Researchers have often been interested in the performance of subgroups--for example, those based on gender or…

Descriptors: Scores, Achievement Tests, Language Tests, English (Second Language)

When Does Scale Anchoring Work? A Case Study

Peer reviewed

Direct link

Sinharay, Sandip; Haberman, Shelby J.; Lee, Yi-Hsuan – Journal of Educational Measurement, 2011

Providing information to test takers and test score users about the abilities of test takers at different score levels has been a persistent problem in educational and psychological measurement. Scale anchoring, a technique which describes what students at different points on a score scale know and can do, is a tool to provide such information.…

Descriptors: Scores, Test Items, Statistical Analysis, Licensing Examinations (Professions)

Statistical Procedures to Evaluate Quality of Scale Anchoring. Research Report. ETS RR-11-02

Download full text

Haberman, Shelby J.; Sinharay, Sandip; Lee, Yi-Hsuan – Educational Testing Service, 2011

Providing information to test takers and test score users about the abilities of test takers at different score levels has been a persistent problem in educational and psychological measurement (Carroll, 1993). Scale anchoring (Beaton & Allen, 1992), a technique that describes what students at different points on a score scale know and can do,…

Descriptors: Statistical Analysis, Scores, Regression (Statistics), Item Response Theory

How Does the Knowledge of Subgroup Membership of Examinees Affect the Prediction of True Subscores? Research Report. ETS RR-11-43

Download full text

Haberman, Shelby J.; Sinharay, Sandip – Educational Testing Service, 2011

Subscores are reported for several operational assessments. Haberman (2008) suggested a method based on classical test theory to determine if the true subscore is predicted better by the corresponding subscore or the total score. Researchers are often interested in learning how different subgroups perform on subtests. Stricker (1993) and…

Descriptors: True Scores, Test Theory, Prediction, Group Membership

The Utility of Augmented Subscores in a Licensure Exam: An Evaluation of Methods Using Empirical Data

Peer reviewed

Direct link

Puhan, Gautam; Sinharay, Sandip; Haberman, Shelby; Larkin, Kevin – Applied Measurement in Education, 2010

Will subscores provide additional information than what is provided by the total score? Is there a method that can estimate more trustworthy subscores than observed subscores? To answer the first question, this study evaluated whether the true subscore was more accurately predicted by the observed subscore or total score. To answer the second…

Descriptors: Licensing Examinations (Professions), Scores, Computation, Methods

The Missing Data Assumptions of the Nonequivalent Groups with Anchor Test (NEAT) Design and Their Implications for Test Equating. Research Report. ETS RR-09-16

Download full text

Sinharay, Sandip; Holland, Paul W. – Educational Testing Service, 2008

The nonequivalent groups with anchor test (NEAT) design involves missing data that are missing by design. Three popular equating methods that can be used with a NEAT design are the poststratification equating method, the chain equipercentile equating method, and the item-response-theory observed-score-equating method. These three methods each…

Descriptors: Equated Scores, Test Items, Item Response Theory, Data

Using Past Data to Enhance Small-Sample DIF Estimation: A Bayesian Approach. Research Report. ETS RR-06-09

Peer reviewed
PDF on ERIC

Download full text

Sinharay, Sandip; Dorans, Neil J.; Grant, Mary C.; Blew, Edwin O.; Knorr, Colleen M. – ETS Research Report Series, 2006

The application of the Mantel-Haenszel test statistic (and other popular DIF-detection methods) to determine DIF requires large samples, but test administrators often need to detect DIF with small samples. There is no universally agreed upon statistical approach for performing DIF analysis with small samples; hence there is substantial scope of…

Descriptors: Test Bias, Computation, Sample Size, Bayesian Statistics

Testing the Untestable Assumptions of the Chain and Poststratification Equating Methods for the NEAT Design. Research Report. ETS RR-06-17

Peer reviewed
PDF on ERIC

Download full text

Holland, Paul W.; von Davier, Alina A.; Sinharay, Sandip; Han, Ning – ETS Research Report Series, 2006

This paper focuses on the Non-Equivalent Groups with Anchor Test (NEAT) design for test equating and on two classes of observed--score equating (OSE) methods--chain equating (CE) and poststratification equating (PSE). These two classes of methods reflect two distinctly different ways of using the information provided by the anchor test for…

Descriptors: Equated Scores, Test Items, Statistical Analysis, Comparative Analysis