ERIC - Search Results

Publication Date

In 2025	0
Since 2024	0
Since 2021 (last 5 years)	1
Since 2016 (last 10 years)	2
Since 2006 (last 20 years)	7

Descriptor

Simulation	12
Test Bias	12
Test Validity	12
Item Analysis	7
Test Items	6
Evaluation Methods	3
Models	3
Scores	3
Statistical Analysis	3
Test Reliability	3
Bayesian Statistics	2
Difficulty Level	2
Error of Measurement	2
Evaluation Research	2
Military Service	2
Minority Groups	2
Prediction	2
Sample Size	2
Test Construction	2
Accuracy	1
Achievement Gains	1
Adaptive Testing	1
Alternative Assessment	1
Aptitude Tests	1
Bias	1
More ▼

Source

Journal of Educational and…	2
Applied Measurement in…	1
Assessment	1
Center for Education Data &…	1
ETS Research Report Series	1
National Center for Research…	1
ProQuest LLC	1

Publication Type

Reports - Research	8
Journal Articles	5
Reports - Descriptive	2
Dissertations/Theses -…	1
Reports - Evaluative	1

Education Level

Elementary Education	1
Elementary Secondary Education	1
Grade 4	1
Higher Education	1
Intermediate Grades	1

Audience

Location

Laws, Policies, & Programs

Assessments and Surveys

What Works Clearinghouse Rating

Showing all 12 results Save | Export

Using Item Scores and Distractors to Detect Item Compromise and Preknowledge

Peer reviewed

Direct link

Gorney, Kylie; Wollack, James A.; Sinharay, Sandip; Eckerly, Carol – Journal of Educational and Behavioral Statistics, 2023

Any time examinees have had access to items and/or answers prior to taking a test, the fairness of the test and validity of test score interpretations are threatened. Therefore, there is a high demand for procedures to detect both compromised items (CI) and examinees with preknowledge (EWP). In this article, we develop a procedure that uses item…

Descriptors: Scores, Test Validity, Test Items, Prior Learning

Impact of Item Parameter Drift on Rasch Scale Stability in Small Samples over Multiple Administrations

Peer reviewed

Direct link

Kopp, Jason P.; Jones, Andrew T. – Applied Measurement in Education, 2020

Traditional psychometric guidelines suggest that at least several hundred respondents are needed to obtain accurate parameter estimates under the Rasch model. However, recent research indicates that Rasch equating results in accurate parameter estimates with sample sizes as small as 25. Item parameter drift under the Rasch model has been…

Descriptors: Item Response Theory, Psychometrics, Sample Size, Sampling

Screening Test Items for Differential Item Functioning

Peer reviewed

Direct link

Longford, Nicholas T. – Journal of Educational and Behavioral Statistics, 2014

A method for medical screening is adapted to differential item functioning (DIF). Its essential elements are explicit declarations of the level of DIF that is acceptable and of the loss function that quantifies the consequences of the two kinds of inappropriate classification of an item. Instead of a single level and a single function, sets of…

Descriptors: Test Items, Test Bias, Simulation, Hypothesis Testing

Differential Item Functioning Assessment in Cognitive Diagnostic Modeling: Applying the Wald Test to Investigate DIF in the Generalized DINA Model Framework

Direct link

Hou, Likun – ProQuest LLC, 2013

Analyzing examinees' responses using cognitive diagnostic models (CDMs) have the advantages of providing richer diagnostic information. To ensure the validity of the results from these models, differential item functioning (DIF) in CDMs needs to be investigated. In this dissertation, the model-based DIF detection method, Wald-CDM procedure is…

Descriptors: Test Bias, Models, Cognitive Processes, Diagnostic Tests

A Review of ETS Differential Item Functioning Assessment Procedures: Flagging Rules, Minimum Sample Size Requirements, and Criterion Refinement. Research Report. ETS RR-12-08

Peer reviewed
PDF on ERIC

Download full text

Zwick, Rebecca – ETS Research Report Series, 2012

Differential item functioning (DIF) analysis is a key component in the evaluation of the fairness and validity of educational tests. The goal of this project was to review the status of ETS DIF analysis procedures, focusing on three aspects: (a) the nature and stringency of the statistical rules used to flag items, (b) the minimum sample size…

Descriptors: Test Bias, Sample Size, Bayesian Statistics, Evaluation Methods

Assessing the "Rothstein Falsification Test": Does It Really Show Teacher Value-Added Models Are Biased? CEDR Working Paper No. 2012 1.3

Direct link

Goldhaber, Dan; Chaplin, Duncan – Center for Education Data & Research, 2012

In a provocative and influential paper, Jesse Rothstein (2010) finds that standard value added models (VAMs) suggest implausible future teacher effects on past student achievement, a finding that obviously cannot be viewed as causal. This is the basis of a falsification test (the Rothstein falsification test) that appears to indicate bias in VAM…

Descriptors: School Effectiveness, Teacher Effectiveness, Achievement Gains, Statistical Bias

What Probably Works in Alternative Assessment. CRESST Report 772

Download full text

Baker, Eva L. – National Center for Research on Evaluation, Standards, and Student Testing (CRESST), 2010

This report provides an overview of what was known about alternative assessment at the time that the article was written in 1991. Topics include beliefs about assessment reform, overview of alternative assessment including research knowledge, evidence of assessment impact, and critical features of alternative assessment. The author notes that in…

Descriptors: Alternative Assessment, Evaluation Methods, Evaluation Research, Performance Based Assessment

The Use and Evaluation of Interest Inventories and Simulations.

Download full text

Holland, John L. – 1974

This paper provides a general perspective for evaluating interest inventories and simulations and outlines some activities to stimulate the development of more useful inventories. Previous evaluations have been primarily instrument-specific; have relied generally on opinion rather than evidence; and have focused only on possible sex, age, race, or…

Descriptors: Career Guidance, Evaluation, Improvement, Interest Inventories

Reliable Digit Span is Unaffected by Laboratory-Induced Pain

Peer reviewed

Direct link

Etherton, Joseph L.; Bianchini, Kevin J.; Ciota, Megan A.; Greve, Kevin W. – Assessment, 2005

Reliable Digit Span (RDS) is an indicator used to assess the validity of cognitive test performance. Scores of 7 or lower suggest poor effort or negative response bias. The possibility that RDS scores are also affected by pain has not been addressed thus potentially threatening RDS specificity. The current study used cold pressor-induced pain to…

Descriptors: Response Style (Tests), Simulation, Intelligence Tests, Pain

A Comparison of the Fairness of Adaptive and Conventional Testing Strategies. Research Report 78-1.

Download full text

Pine, Steven M.; Weiss, David J. – 1978

This report examines how selection fairness is influenced by the characteristics of a selection instrument in terms of its distribution of item difficulties, level of item discrimination, degree of item bias, and testing strategy. Computer simulation was used in the administration of either a conventional or Bayesian adaptive ability test to a…

Descriptors: Adaptive Testing, Bayesian Statistics, Comparative Testing, Computer Assisted Testing

An Empirical Investigaiton of Six Methods for Examing Test Item Bias. Final Report.

Merz, William R.; Grossen, Neal E. – 1978

Six approaches to assessing test item bias were examined: transformed item difficulty, point biserial correlations, chi-square, factor analysis, one parameter item characteristic curve, and three parameter item characteristic curve. Data sets for analysis were generated by a Monte Carlo technique based on the three parameter model; thus, four…

Descriptors: Difficulty Level, Evaluation Methods, Factor Analysis, Item Analysis

Effects of Item Characteristics on Test Fairness. Research Report 76-5.

Download full text

Pine, Steven M.; Weiss, David J. – 1976

This report examines how selection fairness is influenced by the item characteristics of a selection instrument in terms of its distribution of item difficulties, level of item discrimination, and degree of item bias. Computer simulation was used in the administration of conventional ability tests to a hypothetical target population consisting of…

Descriptors: Aptitude Tests, Bias, Computer Programs, Culture Fair Tests

Pine, Steven M.	2
Weiss, David J.	2
Baker, Eva L.	1
Bianchini, Kevin J.	1
Chaplin, Duncan	1
Ciota, Megan A.	1
Eckerly, Carol	1
Etherton, Joseph L.	1
Goldhaber, Dan	1
Gorney, Kylie	1
Greve, Kevin W.	1
Grossen, Neal E.	1
Holland, John L.	1
Hou, Likun	1
Jones, Andrew T.	1
Kopp, Jason P.	1
Longford, Nicholas T.	1
Merz, William R.	1
Sinharay, Sandip	1
Wollack, James A.	1
Zwick, Rebecca	1
More ▼