ERIC - Search Results

Publication Date

In 2025	1
Since 2024	2
Since 2021 (last 5 years)	2
Since 2016 (last 10 years)	3
Since 2006 (last 20 years)	13

Descriptor

Comparative Testing	34
Test Bias	34
Test Items	11
Test Format	8
Elementary School Students	7
Higher Education	7
Sex Differences	7
Computer Assisted Testing	6
Item Analysis	6
Mathematics Tests	6
Multiple Choice Tests	6
Educational Testing	5
Elementary Education	5
Foreign Countries	5
Test Construction	5
Test Validity	5
White Students	5
Intelligence Tests	4
Males	4
Racial Differences	4
Reading Tests	4
Scores	4
Student Evaluation	4
Test Content	4
Testing Problems	4
More ▼

Publication Type

Journal Articles	34
Reports - Research	26
Reports - Evaluative	7
Reports - Descriptive	1
Tests/Questionnaires	1

Education Level

Elementary Education	4
Elementary Secondary Education	4
Higher Education	2
Postsecondary Education	2
Grade 3	1
Grade 5	1
Grade 8	1
Secondary Education	1

Audience

Location

Ireland	1
Israel	1
Sweden	1
Tennessee	1
Thailand	1
United Kingdom (England)	1
United States	1

Laws, Policies, & Programs

Assessments and Surveys

Wechsler Adult Intelligence…	2
Wechsler Intelligence Scale…	2
Armed Services Vocational…	1
Graduate Record Examinations	1
Kaufman Assessment Battery…	1
Marlowe Crowne Social…	1
Program for International…	1
Raven Progressive Matrices	1
SAT (College Admission Test)	1
Stanford Achievement Tests	1
Wechsler Preschool and…	1
More ▼

What Works Clearinghouse Rating

Showing 1 to 15 of 34 results Save | Export

Hold the Bets! Should Quasi-Experiments Be Preferred to True Experiments When Causal Generalization Is the Goal?

Peer reviewed

Direct link

Andrew P. Jaciw – American Journal of Evaluation, 2025

By design, randomized experiments (XPs) rule out bias from confounded selection of participants into conditions. Quasi-experiments (QEs) are often considered second-best because they do not share this benefit. However, when results from XPs are used to generalize causal impacts, the benefit from unconfounded selection into conditions may be offset…

Descriptors: Elementary School Students, Elementary School Teachers, Generalization, Test Bias

Item Response Theory Models for Polytomous Multidimensional Forced-Choice Items to Measure Construct Differentiation

Peer reviewed

Direct link

Xuelan Qiu; Jimmy de la Torre; You-Gan Wang; Jinran Wu – Educational Measurement: Issues and Practice, 2024

Multidimensional forced-choice (MFC) items have been found to be useful to reduce response biases in personality assessments. However, conventional scoring methods for the MFC items result in ipsative data, hindering the wider applications of the MFC format. In the last decade, a number of item response theory (IRT) models have been developed,…

Descriptors: Item Response Theory, Personality Traits, Personality Measures, Personality Assessment

Multiple True-False Items: A Comparison of Scoring Algorithms

Peer reviewed

Direct link

Lahner, Felicitas-Maria; Lörwald, Andrea Carolin; Bauer, Daniel; Nouns, Zineb Miriam; Krebs, René; Guttormsen, Sissel; Fischer, Martin R.; Huwendiek, Sören – Advances in Health Sciences Education, 2018

Multiple true-false (MTF) items are a widely used supplement to the commonly used single-best answer (Type A) multiple choice format. However, an optimal scoring algorithm for MTF items has not yet been established, as existing studies yielded conflicting results. Therefore, this study analyzes two questions: What is the optimal scoring algorithm…

Descriptors: Scoring Formulas, Scoring Rubrics, Objective Tests, Multiple Choice Tests

Test Score Equating Using Discrete Anchor Items versus Passage-Based Anchor Items: A Case Study Using "SAT"® Data. Research Report. ETS RR-14-14

Peer reviewed
PDF on ERIC

Download full text

Liu, Jinghua; Zu, Jiyun; Curley, Edward; Carey, Jill – ETS Research Report Series, 2014

The purpose of this study is to investigate the impact of discrete anchor items versus passage-based anchor items on observed score equating using empirical data.This study compares an "SAT"® critical reading anchor that contains more discrete items proportionally, compared to the total tests to be equated, to another anchor that…

Descriptors: Equated Scores, Test Items, College Entrance Examinations, Comparative Analysis

International Comparisons and Sensitivity to Instruction

Peer reviewed

Direct link

Wiliam, Dylan – Assessment in Education: Principles, Policy & Practice, 2008

While international comparisons such as those provided by PISA may be meaningful in terms of overall judgements about the performance of educational systems, caution is needed in terms of more fine-grained judgements. In particular it is argued that the results of PISA to draw conclusions about the quality of instruction in different systems is…

Descriptors: Test Bias, Test Construction, Comparative Testing, Evaluation

Comparisons among Designs for Equating Mixed-Format Tests in Large-Scale Assessments

Peer reviewed

Direct link

Kim, Sooyeon; Walker, Michael E.; McHale, Frederick – Journal of Educational Measurement, 2010

In this study we examined variations of the nonequivalent groups equating design for tests containing both multiple-choice (MC) and constructed-response (CR) items to determine which design was most effective in producing equivalent scores across the two tests to be equated. Using data from a large-scale exam, this study investigated the use of…

Descriptors: Measures (Individuals), Scoring, Equated Scores, Test Bias

Differentials of a State Reading Assessment: Item Functioning, Distractor Functioning, and Omission Frequency for Disability Categories

Peer reviewed

Direct link

Kato, Kentaro; Moen, Ross E.; Thurlow, Martha L. – Educational Measurement: Issues and Practice, 2009

Large data sets from a state reading assessment for third and fifth graders were analyzed to examine differential item functioning (DIF), differential distractor functioning (DDF), and differential omission frequency (DOF) between students with particular categories of disabilities (speech/language impairments, learning disabilities, and emotional…

Descriptors: Learning Disabilities, Language Impairments, Behavior Disorders, Affective Behavior

Comparability of GCSE Examinations in Different Subjects: An Application of the Rasch Model

Peer reviewed

Direct link

Coe, Robert – Oxford Review of Education, 2008

The comparability of examinations in different subjects has been a controversial topic for many years and a number of criticisms have been made of statistical approaches to estimating the "difficulties" of achieving particular grades in different subjects. This paper argues that if comparability is understood in terms of a linking…

Descriptors: Test Items, Grades (Scholastic), Foreign Countries, Test Bias

Examining Differences in Examinee Performance in Paper and Pencil and Computerized Testing

Peer reviewed
PDF on ERIC

Download full text

Direct link

Puhan, Gautam; Boughton, Keith; Kim, Sooyeon – Journal of Technology, Learning, and Assessment, 2007

The study evaluated the comparability of two versions of a certification test: a paper-and-pencil test (PPT) and computer-based test (CBT). An effect size measure known as Cohen's d and differential item functioning (DIF) analyses were used as measures of comparability at the test and item levels, respectively. Results indicated that the effect…

Descriptors: Computer Assisted Testing, Effect Size, Test Bias, Mathematics Tests

Interpreter and Spanish Administration Effects on the WISC Performance on Mexican-American Children.

Peer reviewed

Swanson, Elinor N.; Deblassie, Richard R. – Journal of School Psychology, 1979

A study was conducted to ascertain whether use of an interpreter and/or a regular examiner in administering the WISC would affect test results of a group of Mexican-American children. Spanish administration of some scales of the performance test are likely to elicit optimum performance. (Author)

Descriptors: Comparative Testing, Elementary Education, Mexican Americans, Psychological Testing

Detecting Potentially Biased Test Items: Comparison of IRT Area and Mantel-Haenszel Methods.

Peer reviewed

Hambleton, Ronald K.; Rogers, H. Jane – Applied Measurement in Education, 1989

Item Response Theory and Mantel-Haenszel approaches for investigating differential item performance were compared to assess the level of agreement of the approaches in identifying potentially biased items. Subjects were 2,000 White and 2,000 Native American high school students. The Mantel-Haenszel method provides an acceptable approximation of…

Descriptors: American Indians, Comparative Testing, High School Students, High Schools

School-by-School Test Score Comparisons: Statistical Issues and Pitfalls.

Peer reviewed

Rafferty, Eileen A.; Treff, August V. – ERS Spectrum, 1994

Addresses issues faced by institutions attempting to design school profiles to meet accountability standards. Reports of high-stakes test results can be skewed by choice of statistic type (percent of students passing versus mean scores), sample bias, geographical transients, and omission errors. Administrators must look beyond "common…

Descriptors: Accountability, Achievement Tests, Comparative Testing, Elementary Secondary Education

Comparing DIF across Math and Reading/Language Arts Tests for Students Receiving a Read-Aloud Accommodation

Peer reviewed

Direct link

Bolt, Sara E.; Ysseldyke, James E. – Applied Measurement in Education, 2006

Although testing accommodations are commonly provided to students with disabilities within large-scale testing programs, research findings on how well accommodations allow for comparable measurement of student knowledge and skill remain inconclusive. The purpose of this study was to examine the extent to which 1 commonly held belief about testing…

Descriptors: Oral Reading, Testing Accommodations, Disabilities, Special Needs Students

Sensitivity of the WISC and WISC-R to Subject and Examiner Variables.

Peer reviewed

Hanley, Jerome H.; Barclay, Allan G. – Journal of Black Psychology, 1979

The Revised Wechsler Intelligence Scale for Children appears significantly to widen the gap between Black and White performance, increasing the likelihood of unjustified negative social and educational consequences. (Author/EB)

Descriptors: Black Students, Comparative Testing, Elementary Secondary Education, Intelligence Differences

Sex Differences in WAIS-R Item Performance.

Peer reviewed

Ilai, Doron; Willerman, Lee – Intelligence, 1989

Items showing sex differences on the revised Wechsler Adult Intelligence Scale (WAIS-R) were studied. In a sample of 206 young adults (110 males and 96 females), 15 items demonstrated significant sex differences, but there was no relationship of item-specific gender content to sex differences in item performance. (SLD)

Descriptors: Comparative Testing, Females, Intelligence Tests, Item Analysis

Previous Page | Next Page »

Pages: 1 | 2 | 3

Educational Measurement:…	4
Educational and Psychological…	4
Journal of Technology,…	4
Journal of Educational…	3
Applied Measurement in…	2
Contemporary Educational…	2
Intelligence	2
Advances in Health Sciences…	1
American Educational Research…	1
American Journal of Evaluation	1
Applied Psychological…	1
Assessment in Education:…	1
ERS Spectrum	1
ETS Research Report Series	1
Evaluation and Program…	1
Journal of Black Psychology	1
Journal of School Psychology	1
Oxford Review of Education	1
Scandinavian Journal of…	1
School Psychology Review	1
More ▼

Kim, Sooyeon	2
Whitworth, Randolph H.	2
Allen, Nancy	1
Andrew P. Jaciw	1
Armstrong, Anne-Marie	1
Barclay, Allan G.	1
Bauer, Daniel	1
Bennett, Randy Elliott	1
Bolger, Niall	1
Bolt, Sara E.	1
Boughton, Keith	1
Buhr, Dianne C.	1
Carey, Jill	1
Chipman, Susan F.	1
Chrisman, Sabine M.	1
Coe, Robert	1
Crino, Michael D.	1
Curley, Edward	1
Deblassie, Richard R.	1
Drake, Samuel	1
Drasgow, Fritz	1
Engelhard, George, Jr.	1
Fischer, Martin R.	1
Gibbons, Ruth T.	1
More ▼