ERIC - Search Results

Publication Date

In 2025	0
Since 2024	0
Since 2021 (last 5 years)	0
Since 2016 (last 10 years)	0
Since 2006 (last 20 years)	22

Descriptor

Educational Testing	26
Evaluation Methods	26
Evaluation Research	26
Educational Assessment	12
Measurement	11
Psychometrics	9
Measurement Techniques	7
Student Evaluation	7
Test Construction	7
Accountability	6
Evaluation Problems	6
Testing Problems	6
Alignment (Education)	5
Models	5
Teacher Evaluation	5
Computer Assisted Testing	4
Educational Research	4
Evaluation Criteria	4
Item Response Theory	4
Scores	4
State Standards	4
Test Items	4
Test Use	4
Test Validity	4
Case Studies	3
More ▼

Source

Journal of Educational…	4
ProQuest LLC	4
Applied Measurement in…	2
Educational Testing Service	2
Measurement:…	2
Educational Measurement:…	1
Educational Research and…	1
Educational Technology &…	1
Evaluation Practice	1
IAP - Information Age…	1
Journal of Applied Testing…	1
Journal of Educational…	1
Research Papers in Education	1
Studies in Educational…	1
Yearbook of the National…	1
More ▼

Publication Type

Journal Articles	17
Reports - Research	10
Reports - Evaluative	6
Dissertations/Theses -…	4
Opinion Papers	3
Reports - Descriptive	2
Books	1
Collected Works - General	1
Collected Works - Serial	1
ERIC Digests in Full Text	1
ERIC Publications	1
Numerical/Quantitative Data	1
More ▼

Education Level

Elementary Secondary Education	9
Higher Education	4
Elementary Education	3
Postsecondary Education	2
Secondary Education	2
Adult Education	1
Grade 4	1
Grade 6	1
High Schools	1

Audience

Practitioners

Location

Colombia	1
Florida	1
Kentucky	1
Michigan	1
Mississippi	1
New York	1
South Africa	1
Taiwan	1
Tanzania	1
Texas	1
United Kingdom	1
Virginia	1
More ▼

Laws, Policies, & Programs

No Child Left Behind Act 2001

Assessments and Surveys

ACT Assessment	1
Advanced Placement…	1
California Achievement Tests	1
Florida Comprehensive…	1
National Assessment of…	1
SAT (College Admission Test)	1

What Works Clearinghouse Rating

Showing 1 to 15 of 26 results Save | Export

Optimizing Multiple-Choice Tests as Learning Events

Direct link

Little, Jeri Lynn – ProQuest LLC, 2011

Although generally used for assessment, tests can also serve as tools for learning--but different test formats may not be equally beneficial. Specifically, research has shown multiple-choice tests to be less effective than cued-recall tests in improving the later retention of the tested information (e.g., see meta-analysis by Hamaker, 1986),…

Descriptors: Recall (Psychology), Multiple Choice Tests, Learning Processes, Educational Testing

Assessing State Models of Value-Added Teacher Evaluations: Alignment of Policy, Instruments, and Literature-Based Concepts

Direct link

Hadfield, Timothy E.; Hutchison-Lupardus, Tammy R.; Snyder, Jennifer E. – ProQuest LLC, 2012

This problem-based learning project addressed the need to improve the construction and implementation of value-added teacher evaluation policies and instruments. State officials are constructing value-added teacher evaluation models due to accountability initiatives, while ignoring the holes and problems in its implementation. The team's…

Descriptors: Scores, Educational Testing, Problem Based Learning, Teacher Evaluation

Assessing State Models of Value-Added Teacher Evaluations: Alignment of Policy, Instruments, and Literature-Based Concepts

Direct link

Hutchison-Lupardus, Tammy R.; Hatfield, Timothy E.; Snyder, Jennifer E. – ProQuest LLC, 2012

Descriptors: Scores, Educational Testing, Problem Based Learning, Teacher Evaluation

Assessing State Models of Value-Added Teacher Evaluations: Alignment of Policy, Instruments, and Literature-Based Concepts

Direct link

Snyder, Jennifer E.; Hadfield, Timothy E.; Hutchison-Lupardus, Tammy R. – ProQuest LLC, 2012

Descriptors: Scores, Educational Testing, Problem Based Learning, Teacher Evaluation

A Case of Inconsistent Equatings: How the Man with Four Watches Decides What Time It Is

Peer reviewed

Direct link

Livingston, Samuel A.; Antal, Judit – Applied Measurement in Education, 2010

A simultaneous equating of four new test forms to each other and to one previous form was accomplished through a complex design incorporating seven separate equating links. Each new form was linked to the reference form by four different paths, and each path produced a different score conversion. The procedure used to resolve these inconsistencies…

Descriptors: Measurement Techniques, Measurement, Educational Assessment, Educational Testing

A Comparison of IRT Linking Procedures

Peer reviewed

Direct link

Lee, Won-Chan; Ban, Jae-Chun – Applied Measurement in Education, 2010

Various applications of item response theory often require linking to achieve a common scale for item parameter estimates obtained from different groups. This article used a simulation to examine the relative performance of four different item response theory (IRT) linking procedures in a random groups equating design: concurrent calibration with…

Descriptors: Item Response Theory, Simulation, Comparative Analysis, Measurement Techniques

Model-Free CUSUM Methods for Person Fit

Peer reviewed

Direct link

Armstrong, Ronald D.; Shi, Min – Journal of Educational Measurement, 2009

This article demonstrates the use of a new class of model-free cumulative sum (CUSUM) statistics to detect person fit given the responses to a linear test. The fundamental statistic being accumulated is the likelihood ratio of two probabilities. The detection performance of this CUSUM scheme is compared to other model-free person-fit statistics…

Descriptors: Probability, Simulation, Models, Psychometrics

Evaluating the Rank-Ordering Method for Standard Maintaining

Peer reviewed

Direct link

Bramley, Tom; Gill, Tim – Research Papers in Education, 2010

The rank-ordering method for standard maintaining was designed for the purpose of mapping a known cut-score (e.g. a grade boundary mark) on one test to an equivalent point on the test score scale of another test, using holistic expert judgements about the quality of exemplars of examinees' work (scripts). It is a novel application of an old…

Descriptors: Scores, Psychometrics, Measurement Techniques, Foreign Countries

A Proposed Framework of Test Administration Methods

Peer reviewed

Direct link

Thompson, Nathan A. – Journal of Applied Testing Technology, 2008

The widespread application of personal computers to educational and psychological testing has substantially increased the number of test administration methodologies available to testing programs. Many of these mediums are referred to by their acronyms, such as CAT, CBT, CCT, and LOFT. The similarities between the acronyms and the methods…

Descriptors: Testing Programs, Psychological Testing, Classification, Educational Testing

On Applications of Rasch Models in International Comparative Large-Scale Assessments: A Historical Review

Peer reviewed

Direct link

Wendt, Heike; Bos, Wilfried; Goy, Martin – Educational Research and Evaluation, 2011

Several current international comparative large-scale assessments of educational achievement (ICLSA) make use of "Rasch models", to address functions essential for valid cross-cultural comparisons. From a historical perspective, ICLSA and Georg Rasch's "models for measurement" emerged at about the same time, half a century ago. However, the…

Descriptors: Measures (Individuals), Test Theory, Group Testing, Educational Testing

ETS Research Spotlight: Issue 2

Download full text

Johnson, Jeff, Ed. – Educational Testing Service, 2009

In four articles adapted from the Educational Testing Service (ETS) Research Report Series, Issue 2 of ETS Research Spotlight provides a small taste of the range of assessment-related research capabilities of the ETS Research and Development Division. Those articles cover assessment-related research aimed at developing models of student learning,…

Descriptors: Basic Writing, Educational Testing, Research Reports, Measures (Individuals)

Monitoring Rater Performance over Time: A Framework for Detecting Differential Accuracy and Differential Scale Category Use

Peer reviewed

Direct link

Myford, Carol M.; Wolfe, Edward W. – Journal of Educational Measurement, 2009

In this study, we describe a framework for monitoring rater performance over time. We present several statistical indices to identify raters whose standards drift and explain how to use those indices operationally. To illustrate the use of the framework, we analyzed rating data from the 2002 Advanced Placement English Literature and Composition…

Descriptors: English Literature, Advanced Placement, Measures (Individuals), Writing (Composition)

A Practical Illustration of Multidimensional Diagnostic Skills Profiling: Comparing Results from Confirmatory Factor Analysis and Diagnostic Classification Models

Peer reviewed

Direct link

Kunina-Habenicht, Olga; Rupp, Andre A.; Wilhelm, Oliver – Studies in Educational Evaluation, 2009

In recent years there has been an increasing international interest in fine-grained diagnostic inferences on multiple skills for formative purposes. A successful provision of such inferences that support meaningful instructional decision-making requires (a) careful diagnostic assessment design coupled with (b) empirical support for the structure…

Descriptors: Educational Testing, Diagnostic Tests, Multidimensional Scaling, Factor Analysis

Judges' Use of Examinee Performance Data in an Angoff Standard-Setting Exercise for a Medical Licensing Examination: An Experimental Study

Peer reviewed

Direct link

Clauser, Brian E.; Mee, Janet; Baldwin, Su G.; Margolis, Melissa J.; Dillon, Gerard F. – Journal of Educational Measurement, 2009

Although the Angoff procedure is among the most widely used standard setting procedures for tests comprising multiple-choice items, research has shown that subject matter experts have considerable difficulty accurately making the required judgments in the absence of examinee performance data. Some authors have viewed the need to provide…

Descriptors: Standard Setting (Scoring), Program Effectiveness, Expertise, Health Personnel

The Hierarchy Consistency Index: Evaluating Person Fit for Cognitive Diagnostic Assessment

Peer reviewed

Direct link

Cui, Ying; Leighton, Jacqueline P. – Journal of Educational Measurement, 2009

In this article, we introduce a person-fit statistic called the hierarchy consistency index (HCI) to help detect misfitting item response vectors for tests developed and analyzed based on a cognitive model. The HCI ranges from -1.0 to 1.0, with values close to -1.0 indicating that students respond unexpectedly or differently from the responses…

Descriptors: Test Length, Simulation, Correlation, Research Methodology

Previous Page | Next Page »

Pages: 1 | 2

Hutchison-Lupardus, Tammy R.	3
Snyder, Jennifer E.	3
Hadfield, Timothy E.	2
Alexiou, Jon J.	1
Antal, Judit	1
Armstrong, Ronald D.	1
Baldwin, Su G.	1
Ban, Jae-Chun	1
Bos, Wilfried	1
Bramley, Tom	1
Chen, Deng-Jyi	1
Chen, Shu-Ling	1
Clauser, Brian E.	1
Cox, Donald	1
Cronje, Johannes C.	1
Cui, Ying	1
Dillon, Gerard F.	1
Dwyer, Carol A.	1
Gill, Tim	1
Goy, Martin	1
Grant, S. G., Ed.	1
Haertel, Edward H.	1
Hatfield, Timothy E.	1
Hauer, Diane M.	1
Herman, Joan L.	1
More ▼