NotesFAQContact Us
Collection
Advanced
Search Tips
Publication Date
In 20250
Since 20240
Since 2021 (last 5 years)0
Since 2016 (last 10 years)0
Since 2006 (last 20 years)22
Audience
Practitioners1
Laws, Policies, & Programs
No Child Left Behind Act 20011
What Works Clearinghouse Rating
Showing 1 to 15 of 26 results Save | Export
Little, Jeri Lynn – ProQuest LLC, 2011
Although generally used for assessment, tests can also serve as tools for learning--but different test formats may not be equally beneficial. Specifically, research has shown multiple-choice tests to be less effective than cued-recall tests in improving the later retention of the tested information (e.g., see meta-analysis by Hamaker, 1986),…
Descriptors: Recall (Psychology), Multiple Choice Tests, Learning Processes, Educational Testing
Hadfield, Timothy E.; Hutchison-Lupardus, Tammy R.; Snyder, Jennifer E. – ProQuest LLC, 2012
This problem-based learning project addressed the need to improve the construction and implementation of value-added teacher evaluation policies and instruments. State officials are constructing value-added teacher evaluation models due to accountability initiatives, while ignoring the holes and problems in its implementation. The team's…
Descriptors: Scores, Educational Testing, Problem Based Learning, Teacher Evaluation
Hutchison-Lupardus, Tammy R.; Hatfield, Timothy E.; Snyder, Jennifer E. – ProQuest LLC, 2012
This problem-based learning project addressed the need to improve the construction and implementation of value-added teacher evaluation policies and instruments. State officials are constructing value-added teacher evaluation models due to accountability initiatives, while ignoring the holes and problems in its implementation. The team's…
Descriptors: Scores, Educational Testing, Problem Based Learning, Teacher Evaluation
Snyder, Jennifer E.; Hadfield, Timothy E.; Hutchison-Lupardus, Tammy R. – ProQuest LLC, 2012
This problem-based learning project addressed the need to improve the construction and implementation of value-added teacher evaluation policies and instruments. State officials are constructing value-added teacher evaluation models due to accountability initiatives, while ignoring the holes and problems in its implementation. The team's…
Descriptors: Scores, Educational Testing, Problem Based Learning, Teacher Evaluation
Peer reviewed Peer reviewed
Direct linkDirect link
Livingston, Samuel A.; Antal, Judit – Applied Measurement in Education, 2010
A simultaneous equating of four new test forms to each other and to one previous form was accomplished through a complex design incorporating seven separate equating links. Each new form was linked to the reference form by four different paths, and each path produced a different score conversion. The procedure used to resolve these inconsistencies…
Descriptors: Measurement Techniques, Measurement, Educational Assessment, Educational Testing
Peer reviewed Peer reviewed
Direct linkDirect link
Lee, Won-Chan; Ban, Jae-Chun – Applied Measurement in Education, 2010
Various applications of item response theory often require linking to achieve a common scale for item parameter estimates obtained from different groups. This article used a simulation to examine the relative performance of four different item response theory (IRT) linking procedures in a random groups equating design: concurrent calibration with…
Descriptors: Item Response Theory, Simulation, Comparative Analysis, Measurement Techniques
Peer reviewed Peer reviewed
Direct linkDirect link
Armstrong, Ronald D.; Shi, Min – Journal of Educational Measurement, 2009
This article demonstrates the use of a new class of model-free cumulative sum (CUSUM) statistics to detect person fit given the responses to a linear test. The fundamental statistic being accumulated is the likelihood ratio of two probabilities. The detection performance of this CUSUM scheme is compared to other model-free person-fit statistics…
Descriptors: Probability, Simulation, Models, Psychometrics
Peer reviewed Peer reviewed
Direct linkDirect link
Bramley, Tom; Gill, Tim – Research Papers in Education, 2010
The rank-ordering method for standard maintaining was designed for the purpose of mapping a known cut-score (e.g. a grade boundary mark) on one test to an equivalent point on the test score scale of another test, using holistic expert judgements about the quality of exemplars of examinees' work (scripts). It is a novel application of an old…
Descriptors: Scores, Psychometrics, Measurement Techniques, Foreign Countries
Peer reviewed Peer reviewed
Direct linkDirect link
Thompson, Nathan A. – Journal of Applied Testing Technology, 2008
The widespread application of personal computers to educational and psychological testing has substantially increased the number of test administration methodologies available to testing programs. Many of these mediums are referred to by their acronyms, such as CAT, CBT, CCT, and LOFT. The similarities between the acronyms and the methods…
Descriptors: Testing Programs, Psychological Testing, Classification, Educational Testing
Peer reviewed Peer reviewed
Direct linkDirect link
Wendt, Heike; Bos, Wilfried; Goy, Martin – Educational Research and Evaluation, 2011
Several current international comparative large-scale assessments of educational achievement (ICLSA) make use of "Rasch models", to address functions essential for valid cross-cultural comparisons. From a historical perspective, ICLSA and Georg Rasch's "models for measurement" emerged at about the same time, half a century ago. However, the…
Descriptors: Measures (Individuals), Test Theory, Group Testing, Educational Testing
Johnson, Jeff, Ed. – Educational Testing Service, 2009
In four articles adapted from the Educational Testing Service (ETS) Research Report Series, Issue 2 of ETS Research Spotlight provides a small taste of the range of assessment-related research capabilities of the ETS Research and Development Division. Those articles cover assessment-related research aimed at developing models of student learning,…
Descriptors: Basic Writing, Educational Testing, Research Reports, Measures (Individuals)
Peer reviewed Peer reviewed
Direct linkDirect link
Myford, Carol M.; Wolfe, Edward W. – Journal of Educational Measurement, 2009
In this study, we describe a framework for monitoring rater performance over time. We present several statistical indices to identify raters whose standards drift and explain how to use those indices operationally. To illustrate the use of the framework, we analyzed rating data from the 2002 Advanced Placement English Literature and Composition…
Descriptors: English Literature, Advanced Placement, Measures (Individuals), Writing (Composition)
Peer reviewed Peer reviewed
Direct linkDirect link
Kunina-Habenicht, Olga; Rupp, Andre A.; Wilhelm, Oliver – Studies in Educational Evaluation, 2009
In recent years there has been an increasing international interest in fine-grained diagnostic inferences on multiple skills for formative purposes. A successful provision of such inferences that support meaningful instructional decision-making requires (a) careful diagnostic assessment design coupled with (b) empirical support for the structure…
Descriptors: Educational Testing, Diagnostic Tests, Multidimensional Scaling, Factor Analysis
Peer reviewed Peer reviewed
Direct linkDirect link
Clauser, Brian E.; Mee, Janet; Baldwin, Su G.; Margolis, Melissa J.; Dillon, Gerard F. – Journal of Educational Measurement, 2009
Although the Angoff procedure is among the most widely used standard setting procedures for tests comprising multiple-choice items, research has shown that subject matter experts have considerable difficulty accurately making the required judgments in the absence of examinee performance data. Some authors have viewed the need to provide…
Descriptors: Standard Setting (Scoring), Program Effectiveness, Expertise, Health Personnel
Peer reviewed Peer reviewed
Direct linkDirect link
Cui, Ying; Leighton, Jacqueline P. – Journal of Educational Measurement, 2009
In this article, we introduce a person-fit statistic called the hierarchy consistency index (HCI) to help detect misfitting item response vectors for tests developed and analyzed based on a cognitive model. The HCI ranges from -1.0 to 1.0, with values close to -1.0 indicating that students respond unexpectedly or differently from the responses…
Descriptors: Test Length, Simulation, Correlation, Research Methodology
Previous Page | Next Page ยป
Pages: 1  |  2