Publication Date
In 2025 | 1 |
Since 2024 | 1 |
Since 2021 (last 5 years) | 1 |
Since 2016 (last 10 years) | 1 |
Since 2006 (last 20 years) | 10 |
Descriptor
Source
Author
Applegate, Brooks | 1 |
Barkhi, Reza | 1 |
Bolt, Sara E. | 1 |
Chang, Yu-Wen | 1 |
Clauser, Brian E. | 1 |
Coe, Robert | 1 |
Cohen, Allan S. | 1 |
Davison, Mark L. | 1 |
Dorans, Neil J. | 1 |
Du Bose, Pansy | 1 |
Huynh, Huynh | 1 |
More ▼ |
Publication Type
Reports - Evaluative | 24 |
Journal Articles | 13 |
Speeches/Meeting Papers | 6 |
Information Analyses | 1 |
Opinion Papers | 1 |
Reports - Research | 1 |
Education Level
Elementary Secondary Education | 4 |
Higher Education | 2 |
Elementary Education | 1 |
Grade 3 | 1 |
Grade 5 | 1 |
Postsecondary Education | 1 |
Secondary Education | 1 |
Audience
Laws, Policies, & Programs
Assessments and Surveys
SAT (College Admission Test) | 2 |
National Assessment of… | 1 |
Program for International… | 1 |
What Works Clearinghouse Rating
Ole J. Kemi – Advances in Physiology Education, 2025
Students are assessed by coursework and/or exams, all of which are marked by assessors (markers). Student and marker performances are then subject to end-of-session board of examiner handling and analysis. This occurs annually and is the basis for evaluating students but also the wider learning and teaching efficiency of an academic institution.…
Descriptors: Undergraduate Students, Evaluation Methods, Evaluation Criteria, Academic Standards
Wiliam, Dylan – Assessment in Education: Principles, Policy & Practice, 2008
While international comparisons such as those provided by PISA may be meaningful in terms of overall judgements about the performance of educational systems, caution is needed in terms of more fine-grained judgements. In particular it is argued that the results of PISA to draw conclusions about the quality of instruction in different systems is…
Descriptors: Test Bias, Test Construction, Comparative Testing, Evaluation
Barkhi, Reza; Williams, Paul – Assessment & Evaluation in Higher Education, 2010
With the proliferation of computer networks and the increased use of Internet-based applications, many forms of social interactions now take place in an on-line context through "Computer-Mediated Communication" (CMC). Many universities are now reaping the benefits of using CMC applications to collect data on student evaluations of…
Descriptors: Computer Mediated Communication, Faculty Evaluation, Foreign Countries, Student Evaluation of Teacher Performance
Kim, Sooyeon; Walker, Michael E.; McHale, Frederick – Journal of Educational Measurement, 2010
In this study we examined variations of the nonequivalent groups equating design for tests containing both multiple-choice (MC) and constructed-response (CR) items to determine which design was most effective in producing equivalent scores across the two tests to be equated. Using data from a large-scale exam, this study investigated the use of…
Descriptors: Measures (Individuals), Scoring, Equated Scores, Test Bias
Kato, Kentaro; Moen, Ross E.; Thurlow, Martha L. – Educational Measurement: Issues and Practice, 2009
Large data sets from a state reading assessment for third and fifth graders were analyzed to examine differential item functioning (DIF), differential distractor functioning (DDF), and differential omission frequency (DOF) between students with particular categories of disabilities (speech/language impairments, learning disabilities, and emotional…
Descriptors: Learning Disabilities, Language Impairments, Behavior Disorders, Affective Behavior
Coe, Robert – Oxford Review of Education, 2008
The comparability of examinations in different subjects has been a controversial topic for many years and a number of criticisms have been made of statistical approaches to estimating the "difficulties" of achieving particular grades in different subjects. This paper argues that if comparability is understood in terms of a linking…
Descriptors: Test Items, Grades (Scholastic), Foreign Countries, Test Bias

Kim, Seock-Ho; Cohen, Allan S. – Applied Psychological Measurement, 1991
The exact and closed-interval area measures for detecting differential item functioning are compared for actual data from 1,000 African-American and 1,000 white college students taking a vocabulary test with items intentionally constructed to favor 1 set of examinees. No real differences in detection of biased items were found. (SLD)
Descriptors: Black Students, College Students, Comparative Testing, Equations (Mathematics)

Dorans, Neil J.; And Others – Journal of Educational Measurement, 1992
The standardization approach to comprehensive differential item functioning is described and contrasted with the log-linear approach to differential distractor functioning and the item-response-theory-based approach to differential alternative functioning. Data from an edition of the Scholastic Aptitude Test illustrate application of the approach…
Descriptors: Black Students, College Entrance Examinations, Comparative Testing, Distractors (Tests)

Skaggs, Gary; Lissitz, Robert W. – Journal of Educational Measurement, 1992
The consistency of several item bias detection methods was studied across different test administrations of the same items using data from a mathematics test given to approximately 6,600 eighth grade students in all. The Mantel Haenszel and item-response-theory-based sum-of-squares methods were the most consistent. (SLD)
Descriptors: Comparative Testing, Grade 8, Item Bias, Item Response Theory
Miron, Gary; Applegate, Brooks – Education and the Public Interest Center, 2009
The Center for Research on Education Outcomes (CREDO) at Stanford University conducted a large-scale analysis of the impact of charter schools on student performance. The center's data covered 65-70% of the nation's charter schools. Although results varied by state, 17% of the charter school students have significantly higher math results than …
Descriptors: Evidence, Traditional Schools, Charter Schools, Program Effectiveness
Spray, Judith A.; Miller, Timothy R. – 1992
A popular method of analyzing test items for differential item functioning (DIF) is to compute a statistic that conditions samples of examinees from different populations on an estimate of ability. This conditioning or matching by ability is intended to produce an appropriate statistic that is sensitive to true differences in item functioning,…
Descriptors: Blacks, College Entrance Examinations, Comparative Testing, Computer Simulation
Reardon, Sean F. – Education and the Public Interest Center, 2009
"How New York City's Charter Schools Affect Achievement" estimates the effects on student achievement of attending a New York City charter school rather than a traditional public school and investigates the characteristics of charter schools associated with the most positive effects on achievement. Because the report relies on an…
Descriptors: Charter Schools, Academic Achievement, Achievement Gains, Achievement Rating
Clauser, Brian E.; And Others – 1991
Item bias has been a major concern for test developers during recent years. The Mantel-Haenszel statistic has been among the preferred methods for identifying biased items. The statistic's performance in identifying uniform bias in simulated data modeled by producing various levels of difference in the (item difficulty) b-parameter for reference…
Descriptors: Comparative Testing, Difficulty Level, Item Bias, Item Response Theory
Mazor, Kathleen M.; And Others – 1993
The Mantel-Haenszel (MH) procedure has become one of the most popular procedures for detecting differential item functioning (DIF). One of the most troublesome criticisms of this procedure is that while detection rates for uniform DIF are very good, the procedure is not sensitive to non-uniform DIF. In this study, examinee responses were generated…
Descriptors: Comparative Testing, Computer Simulation, Item Bias, Item Response Theory

Rafferty, Eileen A.; Treff, August V. – ERS Spectrum, 1994
Addresses issues faced by institutions attempting to design school profiles to meet accountability standards. Reports of high-stakes test results can be skewed by choice of statistic type (percent of students passing versus mean scores), sample bias, geographical transients, and omission errors. Administrators must look beyond "common…
Descriptors: Accountability, Achievement Tests, Comparative Testing, Elementary Secondary Education
Previous Page | Next Page ยป
Pages: 1 | 2