Publication Date
In 2025 | 0 |
Since 2024 | 1 |
Since 2021 (last 5 years) | 3 |
Since 2016 (last 10 years) | 7 |
Since 2006 (last 20 years) | 11 |
Descriptor
Error of Measurement | 15 |
Interrater Reliability | 15 |
Psychometrics | 15 |
Correlation | 6 |
Scoring | 6 |
Evaluation Methods | 5 |
Goodness of Fit | 5 |
Test Reliability | 5 |
Test Validity | 5 |
Children | 4 |
Measures (Individuals) | 4 |
More ▼ |
Source
Author
Anna-Maria Fall | 2 |
Beula M. Magimairaj | 2 |
Greg Roberts | 2 |
Philip Capin | 2 |
Ronald B. Gillam | 2 |
Sandra L. Gillam | 2 |
Sharon Vaughn | 2 |
Anderson, Michele A. | 1 |
Applegate, E. Brooks | 1 |
Brennan, Robert L. | 1 |
De Cat, Jos | 1 |
More ▼ |
Publication Type
Journal Articles | 12 |
Reports - Research | 10 |
Reports - Descriptive | 3 |
Numerical/Quantitative Data | 2 |
Reports - Evaluative | 2 |
Speeches/Meeting Papers | 1 |
Tests/Questionnaires | 1 |
Education Level
Elementary Secondary Education | 2 |
Adult Education | 1 |
Elementary Education | 1 |
Higher Education | 1 |
Postsecondary Education | 1 |
Audience
Researchers | 1 |
Laws, Policies, & Programs
Assessments and Surveys
Work Keys (ACT) | 1 |
What Works Clearinghouse Rating
Stefanie A. Wind; Yangmeng Xu – Educational Assessment, 2024
We explored three approaches to resolving or re-scoring constructed-response items in mixed-format assessments: rater agreement, person fit, and targeted double scoring (TDS). We used a simulation study to consider how the three approaches impact the psychometric properties of student achievement estimates, with an emphasis on person fit. We found…
Descriptors: Interrater Reliability, Error of Measurement, Evaluation Methods, Examiners
Beula M. Magimairaj; Philip Capin; Sandra L. Gillam; Sharon Vaughn; Greg Roberts; Anna-Maria Fall; Ronald B. Gillam – Grantee Submission, 2022
Purpose: Our aim was to evaluate the psychometric properties of the online administered format of the Test of Narrative Language--Second Edition (TNL-2; Gillam & Pearson, 2017), given the importance of assessing children's narrative ability and considerable absence of psychometric studies of spoken language assessments administered online.…
Descriptors: Computer Assisted Testing, Language Tests, Story Telling, Language Impairments
Beula M. Magimairaj; Philip Capin; Sandra L. Gillam; Sharon Vaughn; Greg Roberts; Anna-Maria Fall; Ronald B. Gillam – Language, Speech, and Hearing Services in Schools, 2022
Purpose: Our aim was to evaluate the psychometric properties of the online administered format of the Test of Narrative Language--Second Edition (TNL-2; Gillam & Pearson, 2017), given the importance of assessing children's narrative ability and considerable absence of psychometric studies of spoken language assessments administered online.…
Descriptors: Computer Assisted Testing, Language Tests, Story Telling, Language Impairments
Gingerich, Andrea; Ramlo, Susan E.; van der Vleuten, Cees P. M.; Eva, Kevin W.; Regehr, Glenn – Advances in Health Sciences Education, 2017
Whenever multiple observers provide ratings, even of the same performance, inter-rater variation is prevalent. The resulting "idiosyncratic rater variance" is considered to be unusable error of measurement in psychometric models and is a threat to the defensibility of our assessments. Prior studies of inter-rater variation in clinical…
Descriptors: Interrater Reliability, Error of Measurement, Psychometrics, Q Methodology
van Kernebeek, Willem G.; de Schipper, Antoine W.; Savelsbergh, Geert J. P.; Toussaint, Huub M. – Measurement in Physical Education and Exercise Science, 2018
In The Netherlands, the 4-Skills Scan is an instrument for physical education teachers to assess gross motor skills of elementary school children. Little is known about its reliability. Therefore, in this study the test-retest and inter-rater reliability was determined. Respectively, 624 and 557 Dutch 6- to 12-year-old children were analyzed for…
Descriptors: Foreign Countries, Interrater Reliability, Pretests Posttests, Psychomotor Skills
Mailend, Marja-Liisa; Plante, Elena; Anderson, Michele A.; Applegate, E. Brooks; Nelson, Nickola W. – International Journal of Language & Communication Disorders, 2016
Background: As new standardized tests become commercially available, it is critical that clinicians have access to the information about a test's psychometric properties, including aspects of reliability. Aims: The purpose of the three studies reported in this article was to investigate the reliability of a new test, the Test of Integrated…
Descriptors: Standardized Tests, Psychometrics, Reliability, Language Skills
Han, Chao – Language Assessment Quarterly, 2016
As a property of test scores, reliability/dependability constitutes an important psychometric consideration, and it underpins the validity of measurement results. A review of interpreter certification performance tests (ICPTs) reveals that (a) although reliability/dependability checking has been recognized as an important concern, its theoretical…
Descriptors: Foreign Countries, Scores, English, Chinese
Heyrman, Lieve; Molenaers, Guy; Desloovere, Kaat; Verheyden, Geert; De Cat, Jos; Monbaliu, Elegast; Feys, Hilde – Research in Developmental Disabilities: A Multidisciplinary Journal, 2011
In this study the psychometric properties of the Trunk Control Measurement Scale (TCMS) in children with cerebral palsy (CP) were examined. Twenty-six children with spastic CP (mean age 11 years 3 months, range 8-15 years; Gross Motor Function Classification System level I n = 11, level II n = 5, level III n = 10) were included in this study. To…
Descriptors: Construct Validity, Cerebral Palsy, Test Validity, Interrater Reliability
McKenzie, Robert G. – Learning Disability Quarterly, 2009
The assessment procedures within Response to Intervention (RTI) models have begun to supplant the use of traditional, discrepancy-based frameworks for identifying students with specific learning disabilities (SLD). Many RTI proponents applaud this shift because of perceived shortcomings in utilizing discrepancy as an indicator of SLD. However,…
Descriptors: Intervention, Learning Disabilities, Error of Measurement, Psychometrics
Tsui, Anne S. – 1983
Quality of performance data yielded by subjective judgment is of major concern to researchers in performance appraisal. However, some confusion exists in the analysis of quality on ratings obtained from different rating scale formats and from different raters. To clarify this confusion, a study was conducted to assess the quality of judgmental…
Descriptors: Administrator Evaluation, Administrators, Error of Measurement, Evaluation Methods

Shrivastav, Rahul; Sapienza, Christine M.; Nandur, Vuday – Journal of Speech, Language, and Hearing Research, 2005
Rating scales are commonly used to study voice quality. However, recent research has demonstrated that perceptual measures of voice quality obtained using rating scales suffer from poor interjudge agreement and reliability, especially in the midrange of the scale. These findings, along with those obtained using multidimensional scaling (MDS), have…
Descriptors: Psychometrics, Probability, Rating Scales, Interrater Reliability

Gross, Leon J. – Evaluation and the Health Professions, 1994
Whether adequate levels of interrater reliability could be obtained on a national, standardized examination using one examiner per observation was studied with 101 paired candidate observations on an examination for optometry. Results indicate that psychometrically sound judgments can be obtained with one examiner. (SLD)
Descriptors: Educational Assessment, Error of Measurement, Evaluation Methods, Evaluators
New Mexico Public Education Department, 2007
The purpose of the NMSBA technical report is to provide users and other interested parties with a general overview of and technical characteristics of the 2007 NMSBA. The 2007 technical report contains the following information: (1) Test development; (2) Scoring procedures; (3) Summary of student performance; (4) Statistical analyses of item and…
Descriptors: Interrater Reliability, Standard Setting, Measures (Individuals), Scoring

Brennan, Robert L.; And Others – Educational and Psychological Measurement, 1995
Generalizability theory is used to examine the psychometric characteristics of the Listening and Writing Tests developed by American College Testing for its Work Keys program. Results with samples of 50 suggest the desirability of a minimum number of the tests' tape-recorded messages and the use of at least 2 raters. (SLD)
Descriptors: Audiotape Recordings, Error of Measurement, Generalizability Theory, Interaction
Griph, Gerald W. – New Mexico Public Education Department, 2006
The purpose of the NMSBA technical report is to provide users and other interested parties with a general overview of and technical characteristics of the 2006 NMSBA. The 2006 technical report contains the following information: (1) Test development; (2) Scoring procedures; (3) Calibration, scaling, and equating procedures; (4) Standard setting;…
Descriptors: Interrater Reliability, Standard Setting, Measures (Individuals), Scoring