Publication Date
In 2025 | 1 |
Since 2024 | 1 |
Since 2021 (last 5 years) | 1 |
Since 2016 (last 10 years) | 3 |
Since 2006 (last 20 years) | 7 |
Descriptor
Source
Author
Bardhoshi, Gerta | 1 |
Borich, Gary | 1 |
Brandt, Christina J. | 1 |
Chang, Lei | 1 |
Erford, Bradley T. | 1 |
Griph, Gerald W. | 1 |
Ito, Kyoko | 1 |
Klinzing, Garhard | 1 |
Lichtenstein, Robert | 1 |
Lindell, Michael K. | 1 |
Nandur, Vuday | 1 |
More ▼ |
Publication Type
Reports - Descriptive | 11 |
Journal Articles | 9 |
Numerical/Quantitative Data | 2 |
Education Level
Elementary Secondary Education | 2 |
Audience
Counselors | 1 |
Location
New Mexico | 2 |
Laws, Policies, & Programs
Assessments and Surveys
Stanford Binet Intelligence… | 1 |
Wechsler Intelligence Scale… | 1 |
What Works Clearinghouse Rating
Susan K. Johnsen – Gifted Child Today, 2025
The author provides information about reliability and areas that educators should examine in determining if an assessment is consistent and trustworthy for use, and how it should be interpreted in making decisions about students. Reliability areas that are discussed in the column include internal consistency, test-retest or stability, inter-scorer…
Descriptors: Test Reliability, Academically Gifted, Student Evaluation, Error of Measurement
Lichtenstein, Robert – Communique, 2020
Appropriate interpretation of assessment data requires an appreciation that tools are subject to measurement error. School psychologists recognize, at least on an intellectual level, that measures are imperfect--that test scores and other quantitative measures (e.g., rating scales, systematic behavioral observations) are best estimates of…
Descriptors: Error of Measurement, Test Reliability, Pretests Posttests, Standardized Tests
Bardhoshi, Gerta; Erford, Bradley T. – Measurement and Evaluation in Counseling and Development, 2017
Precision is a key facet of test development, with score reliability determined primarily according to the types of error one wants to approximate and demonstrate. This article identifies and discusses several primary forms of reliability estimation: internal consistency (i.e., split-half, KR-20, a), test-retest, alternate forms, interscorer, and…
Descriptors: Scores, Test Reliability, Accuracy, Pretests Posttests
Sykes, Robert C.; Ito, Kyoko; Wang, Zhen – Educational Measurement: Issues and Practice, 2008
Student responses to a large number of constructed response items in three Math and three Reading tests were scored on two occasions using three ways of assigning raters: single reader scoring, a different reader for each response (item-specific), and three readers each scoring a rater item block (RIB) containing approximately one-third of a…
Descriptors: Test Items, Mathematics Tests, Reading Tests, Scoring
Schumacker, Randall E.; Smith, Everett V., Jr. – Educational and Psychological Measurement, 2007
Measurement error is a common theme in classical measurement models used in testing and assessment. In classical measurement models, the definition of measurement error and the subsequent reliability coefficients differ on the basis of the test administration design. Internal consistency reliability specifies error due primarily to poor item…
Descriptors: Measurement Techniques, Error of Measurement, Item Sampling, Item Response Theory

Lindell, Michael K.; Brandt, Christina J.; Whitney, David J. – Applied Psychological Measurement, 1999
Proposes a revised index of interrater agreement for multi-item ratings of a single target. This index is an inverse linear function of the ratio of the average obtained variance to the variance of the uniformly distributed random error. Discusses the importance of sample size for the index. (SLD)
Descriptors: Error of Measurement, Interrater Reliability, Sample Size

Borich, Gary; Klinzing, Garhard – Journal of Classroom Interaction, 1984
Problems in studying teacher effectiveness through the use of classroom observation are discussed. Four assumptions in the observation of classroom process are offered and ways in which these assumptions can be dealt with in designing an observation study are suggested. (DF)
Descriptors: Classroom Observation Techniques, Error of Measurement, Experimenter Characteristics, Interrater Reliability

Shrivastav, Rahul; Sapienza, Christine M.; Nandur, Vuday – Journal of Speech, Language, and Hearing Research, 2005
Rating scales are commonly used to study voice quality. However, recent research has demonstrated that perceptual measures of voice quality obtained using rating scales suffer from poor interjudge agreement and reliability, especially in the midrange of the scale. These findings, along with those obtained using multidimensional scaling (MDS), have…
Descriptors: Psychometrics, Probability, Rating Scales, Interrater Reliability
Chang, Lei; Van Der Linden, Wim J.; Vos, Hans J. – Educational and Psychological Measurement, 2004
This article introduces a new test-centered standard-setting method as well as a procedure to detect intrajudge inconsistency of the method. The standard-setting method that is based on interdependent evaluations of alternative responses has judges closely evaluate the process that examinees use to solve multiple-choice items. The new method is…
Descriptors: Standard Setting (Scoring), Interrater Reliability, Foreign Countries, Evaluation Methods
New Mexico Public Education Department, 2007
The purpose of the NMSBA technical report is to provide users and other interested parties with a general overview of and technical characteristics of the 2007 NMSBA. The 2007 technical report contains the following information: (1) Test development; (2) Scoring procedures; (3) Summary of student performance; (4) Statistical analyses of item and…
Descriptors: Interrater Reliability, Standard Setting, Measures (Individuals), Scoring
Griph, Gerald W. – New Mexico Public Education Department, 2006
The purpose of the NMSBA technical report is to provide users and other interested parties with a general overview of and technical characteristics of the 2006 NMSBA. The 2006 technical report contains the following information: (1) Test development; (2) Scoring procedures; (3) Calibration, scaling, and equating procedures; (4) Standard setting;…
Descriptors: Interrater Reliability, Standard Setting, Measures (Individuals), Scoring