Publication Date
In 2025 | 1 |
Since 2024 | 2 |
Since 2021 (last 5 years) | 2 |
Since 2016 (last 10 years) | 3 |
Since 2006 (last 20 years) | 13 |
Descriptor
Source
Author
Kim, Sooyeon | 2 |
Whitworth, Randolph H. | 2 |
Allen, Nancy | 1 |
Andrew P. Jaciw | 1 |
Armstrong, Anne-Marie | 1 |
Barclay, Allan G. | 1 |
Bauer, Daniel | 1 |
Bennett, Randy Elliott | 1 |
Bolger, Niall | 1 |
Bolt, Sara E. | 1 |
Boughton, Keith | 1 |
More ▼ |
Publication Type
Journal Articles | 34 |
Reports - Research | 26 |
Reports - Evaluative | 7 |
Reports - Descriptive | 1 |
Tests/Questionnaires | 1 |
Education Level
Elementary Education | 4 |
Elementary Secondary Education | 4 |
Higher Education | 2 |
Postsecondary Education | 2 |
Grade 3 | 1 |
Grade 5 | 1 |
Grade 8 | 1 |
Secondary Education | 1 |
Audience
Laws, Policies, & Programs
Assessments and Surveys
What Works Clearinghouse Rating
Andrew P. Jaciw – American Journal of Evaluation, 2025
By design, randomized experiments (XPs) rule out bias from confounded selection of participants into conditions. Quasi-experiments (QEs) are often considered second-best because they do not share this benefit. However, when results from XPs are used to generalize causal impacts, the benefit from unconfounded selection into conditions may be offset…
Descriptors: Elementary School Students, Elementary School Teachers, Generalization, Test Bias
Xuelan Qiu; Jimmy de la Torre; You-Gan Wang; Jinran Wu – Educational Measurement: Issues and Practice, 2024
Multidimensional forced-choice (MFC) items have been found to be useful to reduce response biases in personality assessments. However, conventional scoring methods for the MFC items result in ipsative data, hindering the wider applications of the MFC format. In the last decade, a number of item response theory (IRT) models have been developed,…
Descriptors: Item Response Theory, Personality Traits, Personality Measures, Personality Assessment
Lahner, Felicitas-Maria; Lörwald, Andrea Carolin; Bauer, Daniel; Nouns, Zineb Miriam; Krebs, René; Guttormsen, Sissel; Fischer, Martin R.; Huwendiek, Sören – Advances in Health Sciences Education, 2018
Multiple true-false (MTF) items are a widely used supplement to the commonly used single-best answer (Type A) multiple choice format. However, an optimal scoring algorithm for MTF items has not yet been established, as existing studies yielded conflicting results. Therefore, this study analyzes two questions: What is the optimal scoring algorithm…
Descriptors: Scoring Formulas, Scoring Rubrics, Objective Tests, Multiple Choice Tests
Liu, Jinghua; Zu, Jiyun; Curley, Edward; Carey, Jill – ETS Research Report Series, 2014
The purpose of this study is to investigate the impact of discrete anchor items versus passage-based anchor items on observed score equating using empirical data.This study compares an "SAT"® critical reading anchor that contains more discrete items proportionally, compared to the total tests to be equated, to another anchor that…
Descriptors: Equated Scores, Test Items, College Entrance Examinations, Comparative Analysis
Wiliam, Dylan – Assessment in Education: Principles, Policy & Practice, 2008
While international comparisons such as those provided by PISA may be meaningful in terms of overall judgements about the performance of educational systems, caution is needed in terms of more fine-grained judgements. In particular it is argued that the results of PISA to draw conclusions about the quality of instruction in different systems is…
Descriptors: Test Bias, Test Construction, Comparative Testing, Evaluation
Kim, Sooyeon; Walker, Michael E.; McHale, Frederick – Journal of Educational Measurement, 2010
In this study we examined variations of the nonequivalent groups equating design for tests containing both multiple-choice (MC) and constructed-response (CR) items to determine which design was most effective in producing equivalent scores across the two tests to be equated. Using data from a large-scale exam, this study investigated the use of…
Descriptors: Measures (Individuals), Scoring, Equated Scores, Test Bias
Kato, Kentaro; Moen, Ross E.; Thurlow, Martha L. – Educational Measurement: Issues and Practice, 2009
Large data sets from a state reading assessment for third and fifth graders were analyzed to examine differential item functioning (DIF), differential distractor functioning (DDF), and differential omission frequency (DOF) between students with particular categories of disabilities (speech/language impairments, learning disabilities, and emotional…
Descriptors: Learning Disabilities, Language Impairments, Behavior Disorders, Affective Behavior
Coe, Robert – Oxford Review of Education, 2008
The comparability of examinations in different subjects has been a controversial topic for many years and a number of criticisms have been made of statistical approaches to estimating the "difficulties" of achieving particular grades in different subjects. This paper argues that if comparability is understood in terms of a linking…
Descriptors: Test Items, Grades (Scholastic), Foreign Countries, Test Bias
Puhan, Gautam; Boughton, Keith; Kim, Sooyeon – Journal of Technology, Learning, and Assessment, 2007
The study evaluated the comparability of two versions of a certification test: a paper-and-pencil test (PPT) and computer-based test (CBT). An effect size measure known as Cohen's d and differential item functioning (DIF) analyses were used as measures of comparability at the test and item levels, respectively. Results indicated that the effect…
Descriptors: Computer Assisted Testing, Effect Size, Test Bias, Mathematics Tests
Interpreter and Spanish Administration Effects on the WISC Performance on Mexican-American Children.

Swanson, Elinor N.; Deblassie, Richard R. – Journal of School Psychology, 1979
A study was conducted to ascertain whether use of an interpreter and/or a regular examiner in administering the WISC would affect test results of a group of Mexican-American children. Spanish administration of some scales of the performance test are likely to elicit optimum performance. (Author)
Descriptors: Comparative Testing, Elementary Education, Mexican Americans, Psychological Testing

Hambleton, Ronald K.; Rogers, H. Jane – Applied Measurement in Education, 1989
Item Response Theory and Mantel-Haenszel approaches for investigating differential item performance were compared to assess the level of agreement of the approaches in identifying potentially biased items. Subjects were 2,000 White and 2,000 Native American high school students. The Mantel-Haenszel method provides an acceptable approximation of…
Descriptors: American Indians, Comparative Testing, High School Students, High Schools

Rafferty, Eileen A.; Treff, August V. – ERS Spectrum, 1994
Addresses issues faced by institutions attempting to design school profiles to meet accountability standards. Reports of high-stakes test results can be skewed by choice of statistic type (percent of students passing versus mean scores), sample bias, geographical transients, and omission errors. Administrators must look beyond "common…
Descriptors: Accountability, Achievement Tests, Comparative Testing, Elementary Secondary Education
Bolt, Sara E.; Ysseldyke, James E. – Applied Measurement in Education, 2006
Although testing accommodations are commonly provided to students with disabilities within large-scale testing programs, research findings on how well accommodations allow for comparable measurement of student knowledge and skill remain inconclusive. The purpose of this study was to examine the extent to which 1 commonly held belief about testing…
Descriptors: Oral Reading, Testing Accommodations, Disabilities, Special Needs Students

Hanley, Jerome H.; Barclay, Allan G. – Journal of Black Psychology, 1979
The Revised Wechsler Intelligence Scale for Children appears significantly to widen the gap between Black and White performance, increasing the likelihood of unjustified negative social and educational consequences. (Author/EB)
Descriptors: Black Students, Comparative Testing, Elementary Secondary Education, Intelligence Differences

Ilai, Doron; Willerman, Lee – Intelligence, 1989
Items showing sex differences on the revised Wechsler Adult Intelligence Scale (WAIS-R) were studied. In a sample of 206 young adults (110 males and 96 females), 15 items demonstrated significant sex differences, but there was no relationship of item-specific gender content to sex differences in item performance. (SLD)
Descriptors: Comparative Testing, Females, Intelligence Tests, Item Analysis