Publication Date
In 2025 | 0 |
Since 2024 | 0 |
Since 2021 (last 5 years) | 1 |
Since 2016 (last 10 years) | 2 |
Since 2006 (last 20 years) | 7 |
Descriptor
Source
Journal of Educational… | 15 |
Author
Clauser, Brian E. | 2 |
Anderson, Ronald E. | 1 |
Armstrong, Ronald D. | 1 |
Baldwin, Su G. | 1 |
Baxter, Gail P. | 1 |
Beaton, Albert E. | 1 |
Cantor, Nancy K. | 1 |
Cui, Ying | 1 |
Dillon, Gerard F. | 1 |
Dorsey, David W. | 1 |
Engelhard, George, Jr. | 1 |
More ▼ |
Publication Type
Journal Articles | 15 |
Reports - Research | 10 |
Reports - Evaluative | 4 |
Reports - Descriptive | 1 |
Education Level
Elementary Secondary Education | 1 |
Secondary Education | 1 |
Audience
Location
Georgia | 1 |
Laws, Policies, & Programs
Assessments and Surveys
National Assessment of… | 3 |
Advanced Placement… | 1 |
What Works Clearinghouse Rating
Dorsey, David W.; Michaels, Hillary R. – Journal of Educational Measurement, 2022
We have dramatically advanced our ability to create rich, complex, and effective assessments across a range of uses through technology advancement. Artificial Intelligence (AI) enabled assessments represent one such area of advancement--one that has captured our collective interest and imagination. Scientists and practitioners within the domains…
Descriptors: Validity, Ethics, Artificial Intelligence, Evaluation Methods
Wilson, Mark; Gochyyev, Perman; Scalise, Kathleen – Journal of Educational Measurement, 2017
This article summarizes assessment of cognitive skills through collaborative tasks, using field test results from the Assessment and Teaching of 21st Century Skills (ATC21S) project. This project, sponsored by Cisco, Intel, and Microsoft, aims to help educators around the world enable students with the skills to succeed in future career and…
Descriptors: Cognitive Ability, Thinking Skills, Evaluation Methods, Educational Assessment
Tendeiro, Jorge N.; Meijer, Rob R. – Journal of Educational Measurement, 2014
In recent guidelines for fair educational testing it is advised to check the validity of individual test scores through the use of person-fit statistics. For practitioners it is unclear on the basis of the existing literature which statistic to use. An overview of relatively simple existing nonparametric approaches to identify atypical response…
Descriptors: Educational Assessment, Test Validity, Scores, Statistical Analysis
Armstrong, Ronald D.; Shi, Min – Journal of Educational Measurement, 2009
This article demonstrates the use of a new class of model-free cumulative sum (CUSUM) statistics to detect person fit given the responses to a linear test. The fundamental statistic being accumulated is the likelihood ratio of two probabilities. The detection performance of this CUSUM scheme is compared to other model-free person-fit statistics…
Descriptors: Probability, Simulation, Models, Psychometrics
Myford, Carol M.; Wolfe, Edward W. – Journal of Educational Measurement, 2009
In this study, we describe a framework for monitoring rater performance over time. We present several statistical indices to identify raters whose standards drift and explain how to use those indices operationally. To illustrate the use of the framework, we analyzed rating data from the 2002 Advanced Placement English Literature and Composition…
Descriptors: English Literature, Advanced Placement, Measures (Individuals), Writing (Composition)
Clauser, Brian E.; Mee, Janet; Baldwin, Su G.; Margolis, Melissa J.; Dillon, Gerard F. – Journal of Educational Measurement, 2009
Although the Angoff procedure is among the most widely used standard setting procedures for tests comprising multiple-choice items, research has shown that subject matter experts have considerable difficulty accurately making the required judgments in the absence of examinee performance data. Some authors have viewed the need to provide…
Descriptors: Standard Setting (Scoring), Program Effectiveness, Expertise, Health Personnel
Cui, Ying; Leighton, Jacqueline P. – Journal of Educational Measurement, 2009
In this article, we introduce a person-fit statistic called the hierarchy consistency index (HCI) to help detect misfitting item response vectors for tests developed and analyzed based on a cognitive model. The HCI ranges from -1.0 to 1.0, with values close to -1.0 indicating that students respond unexpectedly or differently from the responses…
Descriptors: Test Length, Simulation, Correlation, Research Methodology

Frisbie, David A.; Cantor, Nancy K. – Journal of Educational Measurement, 1995
Studied the validity of alternative methods for assessing the spelling achievements of students in grades 2 through 7. Results from 760 third graders, 721 fifth graders, and 639 seventh graders indicate that no single objective format stood out above the others, although some demonstrated superiority to the dictation format on several dimensions.…
Descriptors: Dictation, Educational Assessment, Elementary Education, Elementary School Students

Anderson, Ronald E.; And Others – Journal of Educational Measurement, 1982
Findings on alternative procedures for evaluating measures of achievement in individual data packages at the National Assessment of Educational Progress are presented with their methodological implications. The need for secondary analysts to be aware of the organization of the data, and positive and negative features are discussed. (Author/CM)
Descriptors: Achievement, Databases, Educational Assessment, Elementary Secondary Education

Baxter, Gail P.; And Others – Journal of Educational Measurement, 1992
A procedure-based observational scoring system and a notebook completed by students were evaluated as science assessments for 41 fifth grade students experienced in hands-on science and 55 fifth grade students inexperienced in hands-on science. Results suggest that notebooks may be a reasonable, although less reliable, surrogate for observed…
Descriptors: Classroom Observation Techniques, Comparative Analysis, Educational Assessment, Elementary School Students

Beaton, Albert E.; Johnson, Eugene G. – Journal of Educational Measurement, 1992
The National Assessment of Educational Progress (NAEP) uses item response theory (IRT) based scaling methods to summarize information in complex data sets. The necessity of global scores or more detailed subscores, creation of developmental scales for different ages, and use of scale anchoring for scale interpretation are discussed. (SLD)
Descriptors: Age Differences, Educational Assessment, Elementary Secondary Education, Evaluation Methods

Engelhard, George, Jr. – Journal of Educational Measurement, 1994
Rater errors (rater severity, halo effect, central tendency, and restriction of range) are described, and criteria are presented for evaluating rating quality based on a many-faceted Rasch (FACETS) model. Ratings of 264 compositions from the Eighth Grade Writing Test in Georgia by 15 raters illustrate the discussion. (SLD)
Descriptors: Criteria, Educational Assessment, Elementary Education, Elementary School Students

Mullis, Ina V. S. – Journal of Educational Measurement, 1992
An overview is given of the consensus process for development of the frameworks underlying the National Assessment of Educational Progress (NAEP) assessments, with emphasis on those for the 1990 and 1992 mathematics assessments, the 1992 reading assessment, and the 1994 science assessments. Innovative techniques for 1992 are described. (SLD)
Descriptors: Academic Standards, Content Validity, Educational Assessment, Elementary Secondary Education

Clauser, Brian E.; And Others – Journal of Educational Measurement, 1995
A scoring algorithm for performance assessments is described that is based on expert judgments but requires the rating of only a sample of performances. A regression-based policy capturing procedure was implemented for clinicians evaluating skills of 280 medical students. Results demonstrate the usefulness of the algorithm. (SLD)
Descriptors: Algorithms, Clinical Diagnosis, Computer Simulation, Educational Assessment

Stiggins, Richard J.; And Others – Journal of Educational Measurement, 1989
Classroom assessment procedures of 36 teachers in grades 2 to 12 were studied to determine the extent to which they measure students' higher order thinking skills in mathematics, science, social studies, and language arts. A striking finding was the absence of evaluation of comparative and evaluative thinking. (SLD)
Descriptors: Classroom Techniques, Cognitive Processes, Educational Assessment, Elementary Secondary Education