Publication Date
In 2025 | 0 |
Since 2024 | 4 |
Since 2021 (last 5 years) | 4 |
Since 2016 (last 10 years) | 5 |
Since 2006 (last 20 years) | 11 |
Descriptor
Comparative Testing | 49 |
Test Bias | 49 |
Racial Differences | 12 |
Sex Differences | 12 |
Test Items | 12 |
Test Validity | 12 |
White Students | 12 |
Higher Education | 11 |
Mathematics Tests | 10 |
Black Students | 9 |
Test Format | 9 |
More ▼ |
Source
Author
Whitworth, Randolph H. | 2 |
Allen, Nancy | 1 |
Armstrong, Anne-Marie | 1 |
Barclay, Allan G. | 1 |
Bauer, Daniel | 1 |
Bennett, Randy Elliott | 1 |
Bolger, Niall | 1 |
Boughton, Keith | 1 |
Breland, Hunter M. | 1 |
Buhr, Dianne C. | 1 |
Carey, Jill | 1 |
More ▼ |
Publication Type
Reports - Research | 49 |
Journal Articles | 26 |
Speeches/Meeting Papers | 5 |
Collected Works - General | 1 |
Numerical/Quantitative Data | 1 |
Tests/Questionnaires | 1 |
Education Level
Elementary Education | 4 |
Higher Education | 4 |
Postsecondary Education | 4 |
Elementary Secondary Education | 3 |
Grade 8 | 2 |
Early Childhood Education | 1 |
Grade 3 | 1 |
Grade 4 | 1 |
Primary Education | 1 |
Audience
Researchers | 3 |
Counselors | 1 |
Practitioners | 1 |
Location
South Africa | 2 |
Georgia (Atlanta) | 1 |
Illinois | 1 |
Ireland | 1 |
Israel | 1 |
Surinam | 1 |
Sweden | 1 |
Thailand | 1 |
United States | 1 |
Laws, Policies, & Programs
Elementary and Secondary… | 1 |
No Child Left Behind Act 2001 | 1 |
Assessments and Surveys
What Works Clearinghouse Rating
Peter F. Halpin – Society for Research on Educational Effectiveness, 2024
Background: Meta-analyses of educational interventions have consistently documented the importance of methodological factors related to the choice of outcome measures. In particular, when interventions are evaluated using measures developed by researchers involved with the intervention or its evaluation, the effect sizes tend to be larger than…
Descriptors: College Students, College Faculty, STEM Education, Item Response Theory
Catherine Mata; Katharine Meyer; Lindsay Page – Annenberg Institute for School Reform at Brown University, 2024
This article examines the risk of crossover contamination in individual-level randomization, a common concern in experimental research, in the context of a large-enrollment college course. While individual-level randomization is more efficient for assessing program effectiveness, it also increases the potential for control group students to cross…
Descriptors: Chemistry, Science Instruction, Undergraduate Students, Large Group Instruction
Xuelan Qiu; Jimmy de la Torre; You-Gan Wang; Jinran Wu – Educational Measurement: Issues and Practice, 2024
Multidimensional forced-choice (MFC) items have been found to be useful to reduce response biases in personality assessments. However, conventional scoring methods for the MFC items result in ipsative data, hindering the wider applications of the MFC format. In the last decade, a number of item response theory (IRT) models have been developed,…
Descriptors: Item Response Theory, Personality Traits, Personality Measures, Personality Assessment
Sebastian Kiguel; Sarah Cashdollar; Meg Bates – Illinois Workforce and Education Research Collaborative, Discovery Partners Institute, 2024
In this report, we perform an analysis of kindergarten readiness in Illinois and relate it to students' third grade academic achievement. We study two cohorts of Illinois kindergarteners and follow them into third grade using data provided by the Illinois State Board of Education (ISBE). We summarize our key findings below: (1) Disparities appear…
Descriptors: School Readiness, Early Childhood Education, Test Bias, Culture Fair Tests
Lahner, Felicitas-Maria; Lörwald, Andrea Carolin; Bauer, Daniel; Nouns, Zineb Miriam; Krebs, René; Guttormsen, Sissel; Fischer, Martin R.; Huwendiek, Sören – Advances in Health Sciences Education, 2018
Multiple true-false (MTF) items are a widely used supplement to the commonly used single-best answer (Type A) multiple choice format. However, an optimal scoring algorithm for MTF items has not yet been established, as existing studies yielded conflicting results. Therefore, this study analyzes two questions: What is the optimal scoring algorithm…
Descriptors: Scoring Formulas, Scoring Rubrics, Objective Tests, Multiple Choice Tests
Liu, Jinghua; Zu, Jiyun; Curley, Edward; Carey, Jill – ETS Research Report Series, 2014
The purpose of this study is to investigate the impact of discrete anchor items versus passage-based anchor items on observed score equating using empirical data.This study compares an "SAT"® critical reading anchor that contains more discrete items proportionally, compared to the total tests to be equated, to another anchor that…
Descriptors: Equated Scores, Test Items, College Entrance Examinations, Comparative Analysis
Young, John W.; Holtzman, Steven; Steinberg, Jonathan – Educational Testing Service, 2011
In this research investigation of score comparability for language minority students (English language learners [ELLs] and former English language learners), we examined 3 indicators of score comparability (reliability, internal test structure, and differential item functioning) for 4th and 8th grade students who took the NCLB-mandated content…
Descriptors: Language Minorities, Second Language Learning, Grade 8, Minority Group Students
Puhan, Gautam; Boughton, Keith; Kim, Sooyeon – Journal of Technology, Learning, and Assessment, 2007
The study evaluated the comparability of two versions of a certification test: a paper-and-pencil test (PPT) and computer-based test (CBT). An effect size measure known as Cohen's d and differential item functioning (DIF) analyses were used as measures of comparability at the test and item levels, respectively. Results indicated that the effect…
Descriptors: Computer Assisted Testing, Effect Size, Test Bias, Mathematics Tests
Interpreter and Spanish Administration Effects on the WISC Performance on Mexican-American Children.

Swanson, Elinor N.; Deblassie, Richard R. – Journal of School Psychology, 1979
A study was conducted to ascertain whether use of an interpreter and/or a regular examiner in administering the WISC would affect test results of a group of Mexican-American children. Spanish administration of some scales of the performance test are likely to elicit optimum performance. (Author)
Descriptors: Comparative Testing, Elementary Education, Mexican Americans, Psychological Testing

Hambleton, Ronald K.; Rogers, H. Jane – Applied Measurement in Education, 1989
Item Response Theory and Mantel-Haenszel approaches for investigating differential item performance were compared to assess the level of agreement of the approaches in identifying potentially biased items. Subjects were 2,000 White and 2,000 Native American high school students. The Mantel-Haenszel method provides an acceptable approximation of…
Descriptors: American Indians, Comparative Testing, High School Students, High Schools

Hanley, Jerome H.; Barclay, Allan G. – Journal of Black Psychology, 1979
The Revised Wechsler Intelligence Scale for Children appears significantly to widen the gap between Black and White performance, increasing the likelihood of unjustified negative social and educational consequences. (Author/EB)
Descriptors: Black Students, Comparative Testing, Elementary Secondary Education, Intelligence Differences

Ilai, Doron; Willerman, Lee – Intelligence, 1989
Items showing sex differences on the revised Wechsler Adult Intelligence Scale (WAIS-R) were studied. In a sample of 206 young adults (110 males and 96 females), 15 items demonstrated significant sex differences, but there was no relationship of item-specific gender content to sex differences in item performance. (SLD)
Descriptors: Comparative Testing, Females, Intelligence Tests, Item Analysis

Drasgow, Fritz; And Others – Applied Psychological Measurement, 1991
Extensions of unidimensional appropriateness indices are developed for multiunidimensional tests (multidimensional tests composed of unidimensional subtests). Simulated and real data (scores of 2,978 students on the Armed Services Vocational Aptitude Battery) were used to evaluate the indices' effectiveness in determining individuals who are…
Descriptors: Comparative Testing, Computer Simulation, Equations (Mathematics), Graphs

Crino, Michael D.; And Others – Educational and Psychological Measurement, 1985
The random response technique was compared to a direct questionnaire, administered to college students, to investigate whether or not the responses predicted the social desirability of the item. Results suggest support for the hypothesis. A 33-item version of the Marlowe-Crowne Social Desirability Scale which was used is included. (GDC)
Descriptors: Comparative Testing, Confidentiality, Higher Education, Item Analysis

Whitworth, Randolph H.; Gibbons, Ruth T. – Educational and Psychological Measurement, 1986
A cross-racial comparison of the Wechsler Adult Intelligence Scale (WAIS) was made with the revised version, the WAIS-R. Three groups of Anglo, Black, and Mexican-American male college students were administered both versions of the WAIS on the same day. Significant differences were found among the racial groups. (Author/LMO)
Descriptors: Analysis of Variance, Anglo Americans, Blacks, Comparative Testing