Publication Date
In 2025 | 1 |
Since 2024 | 2 |
Since 2021 (last 5 years) | 2 |
Since 2016 (last 10 years) | 2 |
Since 2006 (last 20 years) | 4 |
Descriptor
Comparative Testing | 10 |
Test Bias | 10 |
Test Construction | 10 |
Multiple Choice Tests | 4 |
Test Items | 4 |
Test Validity | 4 |
Achievement Tests | 3 |
Item Analysis | 3 |
Test Use | 3 |
Computer Assisted Testing | 2 |
High Schools | 2 |
More ▼ |
Source
Educational Measurement:… | 2 |
American Journal of Evaluation | 1 |
Assessment in Education:… | 1 |
Journal of Educational… | 1 |
Author
Publication Type
Reports - Research | 6 |
Journal Articles | 5 |
Reports - Evaluative | 3 |
Speeches/Meeting Papers | 2 |
Opinion Papers | 1 |
Reports - Descriptive | 1 |
Education Level
Elementary Education | 1 |
Elementary Secondary Education | 1 |
Audience
Researchers | 1 |
Location
Tennessee | 1 |
Laws, Policies, & Programs
Assessments and Surveys
Alabama High School… | 1 |
Iowa Tests of Basic Skills | 1 |
Program for International… | 1 |
SAT (College Admission Test) | 1 |
What Works Clearinghouse Rating
Andrew P. Jaciw – American Journal of Evaluation, 2025
By design, randomized experiments (XPs) rule out bias from confounded selection of participants into conditions. Quasi-experiments (QEs) are often considered second-best because they do not share this benefit. However, when results from XPs are used to generalize causal impacts, the benefit from unconfounded selection into conditions may be offset…
Descriptors: Elementary School Students, Elementary School Teachers, Generalization, Test Bias
Xuelan Qiu; Jimmy de la Torre; You-Gan Wang; Jinran Wu – Educational Measurement: Issues and Practice, 2024
Multidimensional forced-choice (MFC) items have been found to be useful to reduce response biases in personality assessments. However, conventional scoring methods for the MFC items result in ipsative data, hindering the wider applications of the MFC format. In the last decade, a number of item response theory (IRT) models have been developed,…
Descriptors: Item Response Theory, Personality Traits, Personality Measures, Personality Assessment
Wiliam, Dylan – Assessment in Education: Principles, Policy & Practice, 2008
While international comparisons such as those provided by PISA may be meaningful in terms of overall judgements about the performance of educational systems, caution is needed in terms of more fine-grained judgements. In particular it is argued that the results of PISA to draw conclusions about the quality of instruction in different systems is…
Descriptors: Test Bias, Test Construction, Comparative Testing, Evaluation
Kim, Sooyeon; Walker, Michael E.; McHale, Frederick – Journal of Educational Measurement, 2010
In this study we examined variations of the nonequivalent groups equating design for tests containing both multiple-choice (MC) and constructed-response (CR) items to determine which design was most effective in producing equivalent scores across the two tests to be equated. Using data from a large-scale exam, this study investigated the use of…
Descriptors: Measures (Individuals), Scoring, Equated Scores, Test Bias
Pine, Steven M.; Weiss, David J. – 1978
This report examines how selection fairness is influenced by the characteristics of a selection instrument in terms of its distribution of item difficulties, level of item discrimination, degree of item bias, and testing strategy. Computer simulation was used in the administration of either a conventional or Bayesian adaptive ability test to a…
Descriptors: Adaptive Testing, Bayesian Statistics, Comparative Testing, Computer Assisted Testing
McManus, Barbara Luger – 1992
This paper discusses whether or not revisions of the Scholastic Aptitude Test (SAT) and the American College Test (ACT) have created such significant differences between the two tests that a student could conceivably score significantly higher on one than the other. The SAT has been revised to meet the needs of an increasingly diverse student…
Descriptors: Ability, Achievement Tests, Aptitude Tests, College Entrance Examinations
Steele, D. Joyce – 1985
This paper contains a comparison of descriptive information based on analyses of pilot and live administrations of the Alabama High School Graduation Examination (AHSGE). The test is composed of three subject tests: Reading, Mathematics, and Language. The study was intended to validate the test development procedure by comparing difficulty levels…
Descriptors: Achievement Tests, Comparative Testing, Difficulty Level, Graduation Requirements

Armstrong, Anne-Marie – Educational Measurement: Issues and Practice, 1993
The effects of test performance of differentially written multiple-choice tests and test takers' cognitive style were studied for 47 graduate students and 35 public school and college teachers. Adhering to test-writing item guidelines resulted in mean scores basically the same for two groups of differing cognitive style. (SLD)
Descriptors: Cognitive Style, College Faculty, Comparative Testing, Graduate Students
Macpherson, Colin R.; Rowley, Glenn L. – 1986
Teacher-made mastery tests were administered in a classroom-sized sample to study their decision consistency. Decision-consistency of criterion-referenced tests is usually defined in terms of the proportion of examinees who are classified in the same way after two test administrations. Single-administration estimates of decision consistency were…
Descriptors: Classroom Research, Comparative Testing, Criterion Referenced Tests, Cutting Scores
Coffman, William E. – 1978
The Iowa Tests of Basic Skills were administered to over 600 black and white students in grades six through nine, to determine if the test showed bias against minorities. Outliers were identified from test results. Outliers are items which differ from the central core of test items because they fall outside the range expected from a random…
Descriptors: Achievement Tests, Basic Skills, Black Students, Comparative Testing