Publication Date
| In 2026 | 0 |
| Since 2025 | 186 |
| Since 2022 (last 5 years) | 1065 |
| Since 2017 (last 10 years) | 2887 |
| Since 2007 (last 20 years) | 6172 |
Descriptor
Source
Author
Publication Type
Education Level
Audience
| Teachers | 480 |
| Practitioners | 358 |
| Researchers | 152 |
| Administrators | 122 |
| Policymakers | 51 |
| Students | 44 |
| Parents | 32 |
| Counselors | 25 |
| Community | 15 |
| Media Staff | 5 |
| Support Staff | 3 |
| More ▼ | |
Location
| Australia | 183 |
| Turkey | 157 |
| California | 133 |
| Canada | 124 |
| New York | 118 |
| United States | 112 |
| Florida | 107 |
| China | 103 |
| Texas | 72 |
| United Kingdom | 72 |
| Japan | 70 |
| More ▼ | |
Laws, Policies, & Programs
Assessments and Surveys
What Works Clearinghouse Rating
| Meets WWC Standards without Reservations | 5 |
| Meets WWC Standards with or without Reservations | 11 |
| Does not meet standards | 8 |
Peer reviewedPark, Hyun-Sook; And Others – Journal of Experimental Education, 1990
The reliability of visual inspection in single-case research was investigated by determining agreement among 5 judges visually inspecting 44 graphs depicting behavior from baseline to intervention. Agreement between visual inspection and statistical procedures was determined. Implications for single-case research are discussed. (SLD)
Descriptors: Behavior Patterns, Evaluation Methods, Evaluators, Graphs
Peer reviewedKarpati, Andrea; Zempleni, Andra; Verhelst, Norman V.; Veldhuijzen, Niels H.; Schonau, Diederik W. – Studies in Educational Evaluation, 1998
How a jury of art evaluators can increase the reliability of its judgments was studied with portfolios from 58 art students in Hungary evaluated by 15 art teacher jurors. Results cast doubt on the reliability of juror assessments. Merits of vertical and horizontal scoring approaches are discussed. (SLD)
Descriptors: Art Education, Art Products, Art Teachers, Foreign Countries
Peer reviewedConway, Kathleen D. – Mathematics Teaching in the Middle School, 1999
Describes the use of the measures of fluency, flexibility, and originality to assess students' responses to open-ended problems in mathematics classrooms. Contains 12 references. (ASK)
Descriptors: Elementary Education, Evaluation Criteria, Junior High Schools, Mathematics Instruction
Peer reviewedChang, Lei – Applied Measurement in Education, 1999
Compared the Nedelsky (L. Nedelsky, 1954) and Angoff (W. Angoff, 1971) standard-setting methods in three studies involving 80 graduate students as judges. Nedelsky cutscores were significantly lower than Angoff cutscores. Suggests that combining the strong features of both methods would make a better standard-setting procedure. (SLD)
Descriptors: Comparative Analysis, Cutting Scores, Graduate Students, Graduate Study
Peer reviewedSchirmer, Barbara R.; Bailey, Jill – TEACHING Exceptional Children, 2000
This article discusses the use of a writing assessment rubric to structure writing instruction with examples from two classes of children who are deaf. It considers strategy modifications, using rubrics for instruction, and creating and using rubrics. Tables provide detail for a universal-type rubric, a modified universal-type rubric, a writing…
Descriptors: Deafness, Elementary Secondary Education, Scoring Rubrics, Teaching Methods
Peer reviewedEngelhard, George, Jr.; Stone, Gregory E. – Educational and Psychological Measurement, 1998
A new approach based on Rasch measurement theory is described for examining the quality of ratings from standard-setting judges. Ratings of nine judges for 213 items on a nursing examination show that judges vary in their views of the essential items for nursing certification, with statistically significant variability in the judged essentiality…
Descriptors: Certification, Evaluation Methods, Item Response Theory, Judges
Peer reviewedFitzpatrick, Anne R.; Ercikan, Kadriye; Yen, Wendy M.; Ferrara, Steven – Applied Measurement in Education, 1998
The consistency between raters over three years of a high-stakes performance assessment was examined in two studies involving a total of approximately 3,000 students in grades three, five, and eight. Results show that raters in different years differ in severity, with raters in mathematics most consistent, and those in language arts least…
Descriptors: Elementary Education, Elementary School Students, High Stakes Tests, Interrater Reliability
Peer reviewedSalend, Spencer J. – TEACHING Exceptional Children, 1998
Among six guidelines for portfolio assessment are (1) identify the goals of the portfolio; (2) determine type of portfolio to be used; and (3) establish procedures for organizing the portfolio. Insets explain performance-based assessment and use of portfolio assessment to comply with the Individuals with Disabilities Education Act Amendments of…
Descriptors: Compliance (Legal), Disabilities, Elementary Secondary Education, Guidelines
Peer reviewedGentile, J. Ronald – Teaching of Psychology, 2000
Describes a classroom activity, listing step-by-step directions, that demonstrates the unreliability of essay scoring. Explains that after the exercise the class discussion should address the problematic factors in scoring essays. Lists recommendations for improving reliability and validity of essay scoring. (CMK)
Descriptors: Class Activities, Discussion (Teaching Technique), Educational Strategies, Essays
Peer reviewedReise, Steven P. – Applied Psychological Measurement, 1995
Psychometric issues pertinent to the application of an item-response-theory-based person-fit (response aberrancy) detection statistic in the personality measurement domain were explored using Monte Carlo methods. Recommendations are made about proper implementation of person-fit statistics in personality measurement. (SLD)
Descriptors: Goodness of Fit, Item Response Theory, Measurement Techniques, Monte Carlo Methods
Peer reviewedWillard-Traub, Margaret; Decker, Emily; Reed, Rebecca; Johnston, Jerome – Assessing Writing, 1999
Examines the mechanics of a large-scale writing portfolio assessment at the University of Michigan, including its impact on matriculation and placement; students' reactions to the requirement; and instructors' evaluation of the efficacy of placements under the new system. Examines the scoring process used by readers to assess portfolios, and…
Descriptors: Curriculum, Higher Education, Portfolio Assessment, Program Evaluation
Goldberg, Gail Lynn; Roswell, Barbara Sherr – Educational Assessment, 2000
Studied the impact of experience scoring the Maryland School Performance Assessment tasks on teachers' instructional and classroom assessment practice. Interview data, questionnaires, classroom observation, and classroom artifacts from approximately 5 teacher-scorers demonstrated that teachers' appropriation of performance-based instruction may be…
Descriptors: Educational Practices, Elementary Education, Elementary School Teachers, Experience
Peer reviewedReise, Steven P. – Applied Psychological Measurement, 2001
This book contains a series of research articles about computerized adaptive testing (CAT) written for advanced psychometricians. The book is divided into sections on: (1) item selection and examinee scoring in CAT; (2) examples of CAT applications; (3) item banks; (4) determining model fit; and (5) using testlets in CAT. (SLD)
Descriptors: Adaptive Testing, Computer Assisted Testing, Goodness of Fit, Item Banks
Peer reviewedChernyshenko, Oleksandr S.; Stark, Stephen; Chan, Kim-Yin; Drasgow, Fritz; Williams, Bruce – Multivariate Behavioral Research, 2001
Compared the fit of several Item Response Theory (IRT) models to two personality assessment instruments using data from 13,059 individuals responding to one instrument and 1,770 individuals responding to the other. Two- and three-parameter logistic models fit some scales reasonably well, but not others, and the graded response model generally did…
Descriptors: Adults, Comparative Analysis, Goodness of Fit, Item Response Theory
Peer reviewedAnderson, John O. – Alberta Journal of Educational Research, 1999
Explores the consequences of using complex test-and-item analysis in a large-scale testing situation that historically has used simple number-right scoring. When the two types of scoring were used with high school graduation exams in British Columbia, results were similar in terms of mean, standard deviation, error of estimation, and correlation…
Descriptors: Academic Achievement, Achievement Tests, Evaluation Research, Foreign Countries

Direct link
