ERIC - Search Results

Publication Date

In 2025	0
Since 2024	1
Since 2021 (last 5 years)	3
Since 2016 (last 10 years)	7
Since 2006 (last 20 years)	11

Descriptor

Error of Measurement	15
Interrater Reliability	15
Psychometrics	15
Correlation	6
Scoring	6
Evaluation Methods	5
Goodness of Fit	5
Test Reliability	5
Test Validity	5
Children	4
Measures (Individuals)	4
Diagnostic Tests	3
English	3
Item Response Theory	3
Language Tests	3
Performance Based Assessment	3
Rating Scales	3
Academic Standards	2
At Risk Students	2
Computation	2
Computer Assisted Testing	2
Cutting Scores	2
Examiners	2
Foreign Countries	2
Generalizability Theory	2
More ▼

Source

New Mexico Public Education…	2
Advances in Health Sciences…	1
Educational Assessment	1
Educational and Psychological…	1
Evaluation and the Health…	1
Grantee Submission	1
International Journal of…	1
Journal of Speech, Language,…	1
Language Assessment Quarterly	1
Language, Speech, and Hearing…	1
Learning Disability Quarterly	1
Measurement in Physical…	1
Research in Developmental…	1
More ▼

Publication Type

Journal Articles	12
Reports - Research	10
Reports - Descriptive	3
Numerical/Quantitative Data	2
Reports - Evaluative	2
Speeches/Meeting Papers	1
Tests/Questionnaires	1

Education Level

Elementary Secondary Education	2
Adult Education	1
Elementary Education	1
Higher Education	1
Postsecondary Education	1

Audience

Researchers

Location

New Mexico	2
China (Beijing)	1
Netherlands (Amsterdam)	1

Laws, Policies, & Programs

Assessments and Surveys

Work Keys (ACT)

What Works Clearinghouse Rating

Showing all 15 results Save | Export

Resolving and Re-Scoring Constructed Response Items in Mixed-Format Assessments: An Exploration of Three Approaches

Peer reviewed

Direct link

Stefanie A. Wind; Yangmeng Xu – Educational Assessment, 2024

We explored three approaches to resolving or re-scoring constructed-response items in mixed-format assessments: rater agreement, person fit, and targeted double scoring (TDS). We used a simulation study to consider how the three approaches impact the psychometric properties of student achievement estimates, with an emphasis on person fit. We found…

Descriptors: Interrater Reliability, Error of Measurement, Evaluation Methods, Examiners

Online Administration of the Test of Narrative Language--Second Edition: Psychometrics and Considerations for Remote Assessment

Peer reviewed
PDF on ERIC

Download full text

Direct link

Beula M. Magimairaj; Philip Capin; Sandra L. Gillam; Sharon Vaughn; Greg Roberts; Anna-Maria Fall; Ronald B. Gillam – Grantee Submission, 2022

Purpose: Our aim was to evaluate the psychometric properties of the online administered format of the Test of Narrative Language--Second Edition (TNL-2; Gillam & Pearson, 2017), given the importance of assessing children's narrative ability and considerable absence of psychometric studies of spoken language assessments administered online.…

Descriptors: Computer Assisted Testing, Language Tests, Story Telling, Language Impairments

Online Administration of the Test of Narrative Language--Second Edition: Psychometrics and Considerations for Remote Assessment

Peer reviewed
PDF on ERIC

Download full text

Direct link

Beula M. Magimairaj; Philip Capin; Sandra L. Gillam; Sharon Vaughn; Greg Roberts; Anna-Maria Fall; Ronald B. Gillam – Language, Speech, and Hearing Services in Schools, 2022

Descriptors: Computer Assisted Testing, Language Tests, Story Telling, Language Impairments

Inter-Rater Variability as Mutual Disagreement: Identifying Raters' Divergent Points of View

Peer reviewed

Direct link

Gingerich, Andrea; Ramlo, Susan E.; van der Vleuten, Cees P. M.; Eva, Kevin W.; Regehr, Glenn – Advances in Health Sciences Education, 2017

Whenever multiple observers provide ratings, even of the same performance, inter-rater variation is prevalent. The resulting "idiosyncratic rater variance" is considered to be unusable error of measurement in psychometric models and is a threat to the defensibility of our assessments. Prior studies of inter-rater variation in clinical…

Descriptors: Interrater Reliability, Error of Measurement, Psychometrics, Q Methodology

Inter-Rater and Test-Retest (Between-Sessions) Reliability of the 4-Skills Scan for Dutch Elementary School Children

Peer reviewed

Direct link

van Kernebeek, Willem G.; de Schipper, Antoine W.; Savelsbergh, Geert J. P.; Toussaint, Huub M. – Measurement in Physical Education and Exercise Science, 2018

In The Netherlands, the 4-Skills Scan is an instrument for physical education teachers to assess gross motor skills of elementary school children. Little is known about its reliability. Therefore, in this study the test-retest and inter-rater reliability was determined. Respectively, 624 and 557 Dutch 6- to 12-year-old children were analyzed for…

Descriptors: Foreign Countries, Interrater Reliability, Pretests Posttests, Psychomotor Skills

Reliability of the Test of Integrated Language and Literacy Skills (TILLS)

Peer reviewed

Direct link

Mailend, Marja-Liisa; Plante, Elena; Anderson, Michele A.; Applegate, E. Brooks; Nelson, Nickola W. – International Journal of Language & Communication Disorders, 2016

Background: As new standardized tests become commercially available, it is critical that clinicians have access to the information about a test's psychometric properties, including aspects of reliability. Aims: The purpose of the three studies reported in this article was to investigate the reliability of a new test, the Test of Integrated…

Descriptors: Standardized Tests, Psychometrics, Reliability, Language Skills

Investigating Score Dependability in English/Chinese Interpreter Certification Performance Testing: A Generalizability Theory Approach

Peer reviewed

Direct link

Han, Chao – Language Assessment Quarterly, 2016

As a property of test scores, reliability/dependability constitutes an important psychometric consideration, and it underpins the validity of measurement results. A review of interpreter certification performance tests (ICPTs) reveals that (a) although reliability/dependability checking has been recognized as an important concern, its theoretical…

Descriptors: Foreign Countries, Scores, English, Chinese

A Clinical Tool to Measure Trunk Control in Children with Cerebral Palsy: The Trunk Control Measurement Scale

Peer reviewed

Direct link

Heyrman, Lieve; Molenaers, Guy; Desloovere, Kaat; Verheyden, Geert; De Cat, Jos; Monbaliu, Elegast; Feys, Hilde – Research in Developmental Disabilities: A Multidisciplinary Journal, 2011

In this study the psychometric properties of the Trunk Control Measurement Scale (TCMS) in children with cerebral palsy (CP) were examined. Twenty-six children with spastic CP (mean age 11 years 3 months, range 8-15 years; Gross Motor Function Classification System level I n = 11, level II n = 5, level III n = 10) were included in this study. To…

Descriptors: Construct Validity, Cerebral Palsy, Test Validity, Interrater Reliability

Obscuring Vital Distinctions: The Oversimplification of Learning Disabilities within RTI

Peer reviewed

Direct link

McKenzie, Robert G. – Learning Disability Quarterly, 2009

The assessment procedures within Response to Intervention (RTI) models have begun to supplant the use of traditional, discrepancy-based frameworks for identifying students with specific learning disabilities (SLD). Many RTI proponents applaud this shift because of perceived shortcomings in utilizing discrepancy as an indicator of SLD. However,…

Descriptors: Intervention, Learning Disabilities, Error of Measurement, Psychometrics

Qualities of Judgmental Ratings by Four Rater Sources.

Download full text

Tsui, Anne S. – 1983

Quality of performance data yielded by subjective judgment is of major concern to researchers in performance appraisal. However, some confusion exists in the analysis of quality on ratings obtained from different rating scale formats and from different raters. To clarify this confusion, a study was conducted to assess the quality of judgmental…

Descriptors: Administrator Evaluation, Administrators, Error of Measurement, Evaluation Methods

Application of Psychometric Theory to the Measurement of Voice Quality Using Rating Scales

Peer reviewed

Shrivastav, Rahul; Sapienza, Christine M.; Nandur, Vuday – Journal of Speech, Language, and Hearing Research, 2005

Rating scales are commonly used to study voice quality. However, recent research has demonstrated that perceptual measures of voice quality obtained using rating scales suffer from poor interjudge agreement and reliability, especially in the midrange of the scale. These findings, along with those obtained using multidimensional scaling (MDS), have…

Descriptors: Psychometrics, Probability, Rating Scales, Interrater Reliability

Interrater Reliability Reconsidered: Performance Assessment Using One Examiner per Candidate.

Peer reviewed

Gross, Leon J. – Evaluation and the Health Professions, 1994

Whether adequate levels of interrater reliability could be obtained on a national, standardized examination using one examiner per observation was studied with 101 paired candidate observations on an examination for optometry. Results indicate that psychometrically sound judgments can be obtained with one examiner. (SLD)

Descriptors: Educational Assessment, Error of Measurement, Evaluation Methods, Evaluators

New Mexico Standards-Based Assessment Technical Report: Spring 2007 Administration

Download full text

New Mexico Public Education Department, 2007

The purpose of the NMSBA technical report is to provide users and other interested parties with a general overview of and technical characteristics of the 2007 NMSBA. The 2007 technical report contains the following information: (1) Test development; (2) Scoring procedures; (3) Summary of student performance; (4) Statistical analyses of item and…

Descriptors: Interrater Reliability, Standard Setting, Measures (Individuals), Scoring

Generalizability Analyses of Work Keys Listening and Writing Tests.

Peer reviewed

Brennan, Robert L.; And Others – Educational and Psychological Measurement, 1995

Generalizability theory is used to examine the psychometric characteristics of the Listening and Writing Tests developed by American College Testing for its Work Keys program. Results with samples of 50 suggest the desirability of a minimum number of the tests' tape-recorded messages and the use of at least 2 raters. (SLD)

Descriptors: Audiotape Recordings, Error of Measurement, Generalizability Theory, Interaction

New Mexico Standards Based Assessment (NMSBA) Technical Report: 2006 Spring Administration

Download full text

Griph, Gerald W. – New Mexico Public Education Department, 2006

The purpose of the NMSBA technical report is to provide users and other interested parties with a general overview of and technical characteristics of the 2006 NMSBA. The 2006 technical report contains the following information: (1) Test development; (2) Scoring procedures; (3) Calibration, scaling, and equating procedures; (4) Standard setting;…

Descriptors: Interrater Reliability, Standard Setting, Measures (Individuals), Scoring

Anna-Maria Fall	2
Beula M. Magimairaj	2
Greg Roberts	2
Philip Capin	2
Ronald B. Gillam	2
Sandra L. Gillam	2
Sharon Vaughn	2
Anderson, Michele A.	1
Applegate, E. Brooks	1
Brennan, Robert L.	1
De Cat, Jos	1
Desloovere, Kaat	1
Eva, Kevin W.	1
Feys, Hilde	1
Gingerich, Andrea	1
Griph, Gerald W.	1
Gross, Leon J.	1
Han, Chao	1
Heyrman, Lieve	1
Mailend, Marja-Liisa	1
McKenzie, Robert G.	1
Molenaers, Guy	1
Monbaliu, Elegast	1
Nandur, Vuday	1
More ▼