Publication Date
In 2025 | 0 |
Since 2024 | 0 |
Since 2021 (last 5 years) | 0 |
Since 2016 (last 10 years) | 0 |
Since 2006 (last 20 years) | 16 |
Descriptor
Educational Testing | 72 |
Evaluation Methods | 72 |
Testing Problems | 72 |
Educational Assessment | 29 |
Elementary Secondary Education | 24 |
Student Evaluation | 22 |
Test Construction | 15 |
Testing Programs | 15 |
Program Evaluation | 14 |
Psychometrics | 14 |
Test Validity | 14 |
More ▼ |
Source
Author
Thurlow, Martha | 8 |
Bielinski, John | 3 |
Minnema, Jane | 3 |
Evers, Arne | 2 |
Novak, Carl D. | 2 |
Scott, Dorene | 2 |
Scott, Jim | 2 |
Abelow, David | 1 |
Almond, Patricia | 1 |
Archer, Mary | 1 |
Ascher, Carol | 1 |
More ▼ |
Publication Type
Education Level
Elementary Secondary Education | 9 |
Higher Education | 2 |
Postsecondary Education | 2 |
Elementary Education | 1 |
Grade 3 | 1 |
Grade 4 | 1 |
Grade 5 | 1 |
Secondary Education | 1 |
Audience
Practitioners | 3 |
Location
United Kingdom (England) | 3 |
United States | 3 |
Netherlands | 2 |
United Kingdom | 2 |
United Kingdom (Wales) | 2 |
Australia | 1 |
California | 1 |
Canada | 1 |
Florida | 1 |
Minnesota | 1 |
Nebraska | 1 |
More ▼ |
Laws, Policies, & Programs
Elementary and Secondary… | 4 |
Individuals with Disabilities… | 1 |
No Child Left Behind Act 2001 | 1 |
Assessments and Surveys
Advanced Placement… | 4 |
National Assessment of… | 2 |
SAT (College Admission Test) | 2 |
Stanford Achievement Tests | 2 |
What Works Clearinghouse Rating
Popham, W. James – Phi Delta Kappan, 2014
The tests we use to evaluate student achievement may well be sound measures of what students know, but they are faulty indicators at best of how well they have been taught. A remedy to this this situation of judging teachers by the performance of their students on high-stakes tests may be in hand already. We should look to the methods successfully…
Descriptors: High Stakes Tests, Academic Achievement, Teacher Evaluation, Evaluation Methods
Emenogu, Barnabas C.; Falenchuk, Olesya; Childs, Ruth A. – Alberta Journal of Educational Research, 2010
Most implementations of the Mantel-Haenszel differential item functioning procedure delete records with missing responses or replace missing responses with scores of 0. These treatments of missing data make strong assumptions about the causes of the missing data. Such assumptions may be particularly problematic when groups differ in their patterns…
Descriptors: Foreign Countries, Test Bias, Test Items, Educational Testing
Wright, Robert E. – College Student Journal, 2010
The use of standardized tests for outcome assessment has grown dramatically in recent years. Two driving factors have been the No Child Left Behind legislation, and the increase in outcome assessment measures by accrediting agencies such as AACSB, the international accrediting body for business schools. Despite the growth in usage, little effort…
Descriptors: College Outcomes Assessment, Educational Testing, Standardized Tests, Accreditation (Institutions)
Ventouras, Errikos; Triantis, Dimos; Tsiakas, Panagiotis; Stergiopoulos, Charalampos – Computers & Education, 2010
The aim of the present research was to compare the use of multiple-choice questions (MCQs) as an examination method, to the examination based on constructed-response questions (CRQs). Despite that MCQs have an advantage concerning objectivity in the grading process and speed in production of results, they also introduce an error in the final…
Descriptors: Computer Assisted Instruction, Scoring, Grading, Comparative Analysis
Cresswell, Mike – Measurement: Interdisciplinary Research and Perspectives, 2010
Paul Newton (2010), with his characteristic concern about theory, has set out two different ways of thinking about the basis upon which equivalences of one sort or another are established between test score scales. His reason for doing this is a desire to establish "the defensibility of linkages lower on the continuum than concordance."…
Descriptors: Foreign Countries, Measurement Techniques, Psychometrics, Comparative Analysis
Papay, John P. – American Educational Research Journal, 2011
Recently, educational researchers and practitioners have turned to value-added models to evaluate teacher performance. Although value-added estimates depend on the assessment used to measure student achievement, the importance of outcome selection has received scant attention in the literature. Using data from a large, urban school district, I…
Descriptors: Urban Schools, Teacher Effectiveness, Reading Achievement, Achievement Tests
Newton, Paul E. – Measurement: Interdisciplinary Research and Perspectives, 2010
This article presents the author's rejoinder to thinking about linking from issue 8(1). Particularly within the more embracing linking frameworks, e.g., Holland & Dorans (2006) and Holland (2007), there appears to be a major disjunction between (1) classification discourse: the supposed basis for classification, that is, the underlying theory…
Descriptors: Foreign Countries, Measurement Techniques, Psychometrics, Comparative Analysis
Walker, Michael E. – Measurement: Interdisciplinary Research and Perspectives, 2010
"Linking" is a term given to a general class of procedures by which one represents scores X on one test or measure in terms of scores Y on another test or measure. A recent taxonomy by Holland and Dorans (2006; Holland, 2007) organizes the various types of links into three broad categories: prediction, scale aligning, and equating. In…
Descriptors: Foreign Countries, Test Construction, Test Validity, Measurement Techniques
Baird, Jo-Anne – Measurement: Interdisciplinary Research and Perspectives, 2010
Newton's article (2010) makes three main contributions to the literature. First, it is transatlantic, bringing together literatures that have been dealing with similar problems, using sometimes different methods and certainly with distinctive educational, cultural perspectives. He points out that neither of these literatures has all of the…
Descriptors: Foreign Countries, Predictive Validity, Standards, Ethics
von Davier, Alina A. – Measurement: Interdisciplinary Research and Perspectives, 2010
The article "Thinking About Linking" by Newton (2010) presents a novel philosophical perspective on the way that educational assessments should be linked. Newton starts by describing the linking framework as it was characterized in various publications and identifies a cross-cultural dimension in the definitions and uses of test…
Descriptors: Foreign Countries, Educational Assessment, Student Evaluation, Evaluation Criteria
Myford, Carol M.; Wolfe, Edward W. – Journal of Educational Measurement, 2009
In this study, we describe a framework for monitoring rater performance over time. We present several statistical indices to identify raters whose standards drift and explain how to use those indices operationally. To illustrate the use of the framework, we analyzed rating data from the 2002 Advanced Placement English Literature and Composition…
Descriptors: English Literature, Advanced Placement, Measures (Individuals), Writing (Composition)
Clauser, Brian E.; Mee, Janet; Baldwin, Su G.; Margolis, Melissa J.; Dillon, Gerard F. – Journal of Educational Measurement, 2009
Although the Angoff procedure is among the most widely used standard setting procedures for tests comprising multiple-choice items, research has shown that subject matter experts have considerable difficulty accurately making the required judgments in the absence of examinee performance data. Some authors have viewed the need to provide…
Descriptors: Standard Setting (Scoring), Program Effectiveness, Expertise, Health Personnel
Cui, Ying; Leighton, Jacqueline P. – Journal of Educational Measurement, 2009
In this article, we introduce a person-fit statistic called the hierarchy consistency index (HCI) to help detect misfitting item response vectors for tests developed and analyzed based on a cognitive model. The HCI ranges from -1.0 to 1.0, with values close to -1.0 indicating that students respond unexpectedly or differently from the responses…
Descriptors: Test Length, Simulation, Correlation, Research Methodology
Austin, Dean A.; Novak, Carl D. – Health Education (Washington D.C.), 1976
This study demonstrates that multiple matrix sampling procedures can be used to collect assessment data efficiently, unabstrusively, and reliably. (MB)
Descriptors: Data Collection, Educational Testing, Evaluation Methods, Item Sampling

Evers, Arne – International Journal of Testing, 2001
Describes the Dutch rating system for test quality, which evaluates a test for seven criteria, and analyses the results of test ratings from the past 18 years. Results show a steady increase in test quality in the Netherlands that can be attributed to use of better tests and declining use of tests of less quality after evaluation. (SLD)
Descriptors: Criteria, Educational Testing, Evaluation Methods, Foreign Countries