ERIC - Search Results

Publication Date

In 2025	0
Since 2024	0
Since 2021 (last 5 years)	0
Since 2016 (last 10 years)	0
Since 2006 (last 20 years)	16

Descriptor

Educational Testing	72
Evaluation Methods	72
Testing Problems	72
Educational Assessment	29
Elementary Secondary Education	24
Student Evaluation	22
Test Construction	15
Testing Programs	15
Program Evaluation	14
Psychometrics	14
Test Validity	14
Test Interpretation	13
Standardized Tests	12
Achievement Tests	11
Criterion Referenced Tests	10
Evaluation Problems	10
Test Selection	10
Foreign Countries	9
Measurement Techniques	9
Test Bias	9
Test Use	9
Accountability	8
Evaluation Criteria	8
High Stakes Tests	8
Test Reliability	8
More ▼

Publication Type

Journal Articles	22
Reports - Research	17
Opinion Papers	15
Information Analyses	9
Collected Works - Proceedings	7
Guides - Non-Classroom	6
Reports - Descriptive	6
Speeches/Meeting Papers	6
Reports - Evaluative	5
Books	2
Collected Works - Serials	2
ERIC Publications	2
Reference Materials -…	2
ERIC Digests in Full Text	1
Guides - General	1
Legal/Legislative/Regulatory…	1
Reports - General	1
Tests/Questionnaires	1
More ▼

Education Level

Elementary Secondary Education	9
Higher Education	2
Postsecondary Education	2
Elementary Education	1
Grade 3	1
Grade 4	1
Grade 5	1
Secondary Education	1

Audience

Practitioners

Location

United Kingdom (England)	3
United States	3
Netherlands	2
United Kingdom	2
United Kingdom (Wales)	2
Australia	1
California	1
Canada	1
Florida	1
Minnesota	1
Nebraska	1
New Jersey	1
New York	1
South Africa	1
More ▼

Laws, Policies, & Programs

Elementary and Secondary…	4
Individuals with Disabilities…	1
No Child Left Behind Act 2001	1

Assessments and Surveys

Advanced Placement…	4
National Assessment of…	2
SAT (College Admission Test)	2
Stanford Achievement Tests	2

What Works Clearinghouse Rating

Showing 1 to 15 of 72 results Save | Export

The Right Test for the Wrong Reason

Direct link

Popham, W. James – Phi Delta Kappan, 2014

The tests we use to evaluate student achievement may well be sound measures of what students know, but they are faulty indicators at best of how well they have been taught. A remedy to this this situation of judging teachers by the performance of their students on high-stakes tests may be in hand already. We should look to the methods successfully…

Descriptors: High Stakes Tests, Academic Achievement, Teacher Evaluation, Evaluation Methods

The Effect of Missing Data Treatment on Mantel-Haenszel DIF Detection

Peer reviewed

Direct link

Emenogu, Barnabas C.; Falenchuk, Olesya; Childs, Ruth A. – Alberta Journal of Educational Research, 2010

Most implementations of the Mantel-Haenszel differential item functioning procedure delete records with missing responses or replace missing responses with scores of 0. These treatments of missing data make strong assumptions about the causes of the missing data. Such assumptions may be particularly problematic when groups differ in their patterns…

Descriptors: Foreign Countries, Test Bias, Test Items, Educational Testing

Standardized Testing for Outcome Assessment: Analysis of the Educational Testing Systems MBA Tests

Peer reviewed

Direct link

Wright, Robert E. – College Student Journal, 2010

The use of standardized tests for outcome assessment has grown dramatically in recent years. Two driving factors have been the No Child Left Behind legislation, and the increase in outcome assessment measures by accrediting agencies such as AACSB, the international accrediting body for business schools. Despite the growth in usage, little effort…

Descriptors: College Outcomes Assessment, Educational Testing, Standardized Tests, Accreditation (Institutions)

Comparison of Examination Methods Based on Multiple-Choice Questions and Constructed-Response Questions Using Personal Computers

Peer reviewed

Direct link

Ventouras, Errikos; Triantis, Dimos; Tsiakas, Panagiotis; Stergiopoulos, Charalampos – Computers & Education, 2010

The aim of the present research was to compare the use of multiple-choice questions (MCQs) as an examination method, to the examination based on constructed-response questions (CRQs). Despite that MCQs have an advantage concerning objectivity in the grading process and speed in production of results, they also introduce an error in the final…

Descriptors: Computer Assisted Instruction, Scoring, Grading, Comparative Analysis

Defending the Quality of Links between Scores from Different Tests and Exams

Peer reviewed

Direct link

Cresswell, Mike – Measurement: Interdisciplinary Research and Perspectives, 2010

Paul Newton (2010), with his characteristic concern about theory, has set out two different ways of thinking about the basis upon which equivalences of one sort or another are established between test score scales. His reason for doing this is a desire to establish "the defensibility of linkages lower on the continuum than concordance."…

Descriptors: Foreign Countries, Measurement Techniques, Psychometrics, Comparative Analysis

Different Tests, Different Answers: The Stability of Teacher Value-Added Estimates across Outcome Measures

Peer reviewed

Direct link

Papay, John P. – American Educational Research Journal, 2011

Recently, educational researchers and practitioners have turned to value-added models to evaluate teacher performance. Although value-added estimates depend on the assessment used to measure student achievement, the importance of outcome selection has received scant attention in the literature. Using data from a large, urban school district, I…

Descriptors: Urban Schools, Teacher Effectiveness, Reading Achievement, Achievement Tests

Conceptualizing Comparability

Peer reviewed

Direct link

Newton, Paul E. – Measurement: Interdisciplinary Research and Perspectives, 2010

This article presents the author's rejoinder to thinking about linking from issue 8(1). Particularly within the more embracing linking frameworks, e.g., Holland & Dorans (2006) and Holland (2007), there appears to be a major disjunction between (1) classification discourse: the supposed basis for classification, that is, the underlying theory…

Descriptors: Foreign Countries, Measurement Techniques, Psychometrics, Comparative Analysis

Linking through Improved Design, Not Redefinition: Commentary on Newton

Peer reviewed

Direct link

Walker, Michael E. – Measurement: Interdisciplinary Research and Perspectives, 2010

"Linking" is a term given to a general class of procedures by which one represents scores X on one test or measure in terms of scores Y on another test or measure. A recent taxonomy by Holland and Dorans (2006; Holland, 2007) organizes the various types of links into three broad categories: prediction, scale aligning, and equating. In…

Descriptors: Foreign Countries, Test Construction, Test Validity, Measurement Techniques

What Constitutes Legitimate Causal Linking?

Peer reviewed

Direct link

Baird, Jo-Anne – Measurement: Interdisciplinary Research and Perspectives, 2010

Newton's article (2010) makes three main contributions to the literature. First, it is transatlantic, bringing together literatures that have been dealing with similar problems, using sometimes different methods and certainly with distinctive educational, cultural perspectives. He points out that neither of these literatures has all of the…

Descriptors: Foreign Countries, Predictive Validity, Standards, Ethics

What Dictates the Meaning of Test Linking? A Reaction to "Thinking about Linking"

Peer reviewed

Direct link

von Davier, Alina A. – Measurement: Interdisciplinary Research and Perspectives, 2010

The article "Thinking About Linking" by Newton (2010) presents a novel philosophical perspective on the way that educational assessments should be linked. Newton starts by describing the linking framework as it was characterized in various publications and identifies a cross-cultural dimension in the definitions and uses of test…

Descriptors: Foreign Countries, Educational Assessment, Student Evaluation, Evaluation Criteria

Monitoring Rater Performance over Time: A Framework for Detecting Differential Accuracy and Differential Scale Category Use

Peer reviewed

Direct link

Myford, Carol M.; Wolfe, Edward W. – Journal of Educational Measurement, 2009

In this study, we describe a framework for monitoring rater performance over time. We present several statistical indices to identify raters whose standards drift and explain how to use those indices operationally. To illustrate the use of the framework, we analyzed rating data from the 2002 Advanced Placement English Literature and Composition…

Descriptors: English Literature, Advanced Placement, Measures (Individuals), Writing (Composition)

Judges' Use of Examinee Performance Data in an Angoff Standard-Setting Exercise for a Medical Licensing Examination: An Experimental Study

Peer reviewed

Direct link

Clauser, Brian E.; Mee, Janet; Baldwin, Su G.; Margolis, Melissa J.; Dillon, Gerard F. – Journal of Educational Measurement, 2009

Although the Angoff procedure is among the most widely used standard setting procedures for tests comprising multiple-choice items, research has shown that subject matter experts have considerable difficulty accurately making the required judgments in the absence of examinee performance data. Some authors have viewed the need to provide…

Descriptors: Standard Setting (Scoring), Program Effectiveness, Expertise, Health Personnel

The Hierarchy Consistency Index: Evaluating Person Fit for Cognitive Diagnostic Assessment

Peer reviewed

Direct link

Cui, Ying; Leighton, Jacqueline P. – Journal of Educational Measurement, 2009

In this article, we introduce a person-fit statistic called the hierarchy consistency index (HCI) to help detect misfitting item response vectors for tests developed and analyzed based on a cognitive model. The HCI ranges from -1.0 to 1.0, with values close to -1.0 indicating that students respond unexpectedly or differently from the responses…

Descriptors: Test Length, Simulation, Correlation, Research Methodology

Using Multiple Matrix Sampling to Assess Health Education Knowledge

Austin, Dean A.; Novak, Carl D. – Health Education (Washington D.C.), 1976

This study demonstrates that multiple matrix sampling procedures can be used to collect assessment data efficiently, unabstrusively, and reliably. (MB)

Descriptors: Data Collection, Educational Testing, Evaluation Methods, Item Sampling

Improving Test Quality in the Netherlands: Results of 18 Years of Test Ratings.

Peer reviewed

Evers, Arne – International Journal of Testing, 2001

Describes the Dutch rating system for test quality, which evaluates a test for seven criteria, and analyses the results of test ratings from the past 18 years. Results show a steady increase in test quality in the Netherlands that can be attributed to use of better tests and declining use of tests of less quality after evaluation. (SLD)

Descriptors: Criteria, Educational Testing, Evaluation Methods, Foreign Countries

Previous Page | Next Page »

Pages: 1 | 2 | 3 | 4 | 5

Measurement:…	7
Journal of Educational…	3
International Journal of…	2
Alberta Journal of…	1
American Educational Research…	1
Arizona English Bulletin	1
Bilingual Research Journal	1
College Student Journal	1
Computers & Education	1
Educational Technology &…	1
Evaluation in Education:…	1
Florida Educational Research…	1
Health Education (Washington…	1
Journal of Biological…	1
Journal of Home Economics	1
Phi Delta Kappan	1
RE:view	1
Research Connections in…	1
Studies in Educational…	1
Technical Education News	1
More ▼

Thurlow, Martha	8
Bielinski, John	3
Minnema, Jane	3
Evers, Arne	2
Novak, Carl D.	2
Scott, Dorene	2
Scott, Jim	2
Abelow, David	1
Almond, Patricia	1
Archer, Mary	1
Ascher, Carol	1
Austin, Dean A.	1
Baird, Jo-Anne	1
Baldwin, Su G.	1
Beard, Joseph W.	1
Borden, Arthur R., Jr.	1
Bossone, Richard M., Ed.	1
Bostow, D. E.	1
Boys, Chris	1
Childs, Ruth A.	1
Cieslak, Paul J.	1
Clauser, Brian E.	1
Cresswell, Mike	1
Cronje, Johannes C.	1
More ▼