ERIC - Search Results

Publication Date

In 2025	3
Since 2024	3
Since 2021 (last 5 years)	8
Since 2016 (last 10 years)	15
Since 2006 (last 20 years)	20

Descriptor

Error of Measurement	29
Interrater Reliability	29
Test Reliability	29
Test Validity	12
Scoring	8
Correlation	7
Generalizability Theory	7
Scores	7
Evaluation Methods	6
Student Evaluation	6
Higher Education	5
Psychometrics	5
Children	4
Foreign Countries	4
Goodness of Fit	4
Language Tests	4
Measurement Techniques	4
Standardized Tests	4
Test Construction	4
Academic Standards	3
Accuracy	3
Computation	3
Cutting Scores	3
High Schools	3
Language Impairments	3
More ▼

Publication Type

Journal Articles	17
Reports - Research	17
Reports - Evaluative	6
Reports - Descriptive	5
Speeches/Meeting Papers	5
Numerical/Quantitative Data	4
Dissertations/Theses -…	1
Tests/Questionnaires	1

Education Level

Postsecondary Education	4
Elementary Secondary Education	3
Higher Education	3
Elementary Education	2
Early Childhood Education	1
Grade 3	1
Grade 4	1
Grade 5	1
Intermediate Grades	1
Middle Schools	1
Primary Education	1
More ▼

Audience

Administrators	2
Researchers	2
Counselors	1

Location

New Mexico	2
Canada	1
Netherlands (Amsterdam)	1
Oklahoma	1
Turkey	1
United Kingdom (England)	1

Laws, Policies, & Programs

Assessments and Surveys

Advanced Placement…	1
Alabama High School…	1
Cognitive Abilities Test	1
Iowa Tests of Basic Skills	1
National Assessment of…	1
Stanford Binet Intelligence…	1
Wechsler Intelligence Scale…	1

What Works Clearinghouse Rating

Showing 1 to 15 of 29 results Save | Export

Technical Adequacy-Reliability

Peer reviewed

Direct link

Susan K. Johnsen – Gifted Child Today, 2025

The author provides information about reliability and areas that educators should examine in determining if an assessment is consistent and trustworthy for use, and how it should be interpreted in making decisions about students. Reliability areas that are discussed in the column include internal consistency, test-retest or stability, inter-scorer…

Descriptors: Test Reliability, Academically Gifted, Student Evaluation, Error of Measurement

Comparison of the Results of the Generalizability Theory with the Inter-Rater Agreement Coefficients

Peer reviewed
PDF on ERIC

Download full text

Eser, Mehmet Taha; Aksu, Gökhan – International Journal of Curriculum and Instruction, 2022

The agreement between raters is examined within the scope of the concept of "inter-rater reliability". Although there are clear definitions of the concepts of agreement between raters and reliability between raters, there is no clear information about the conditions under which agreement and reliability level methods are appropriate to…

Descriptors: Generalizability Theory, Interrater Reliability, Evaluation Methods, Test Theory

The Vague Language Use Scale: Clinical Utility and Psychometrics from Adults with Traumatic Brain Injury

Peer reviewed

Direct link

Kathryn J. Greenslade; Julia K. Bushell; Emily F. Dillon; Amy E. Ramage – International Journal of Language & Communication Disorders, 2025

Background: Pragmatic communication difficulties encompass many distinct behaviours, including the use of vague and/or insufficient language, a common characteristic following traumatic brain injury (TBI) that negatively impacts psychosocial outcomes. Existing assessments evaluate pragmatic communication broadly, often with only one or two items…

Descriptors: Neurological Impairments, Head Injuries, Language Impairments, Language Tests

Evidence-Based Evaluation of Student and Marker Performances in Assessment and Examination

Peer reviewed

Direct link

Ole J. Kemi – Advances in Physiology Education, 2025

Students are assessed by coursework and/or exams, all of which are marked by assessors (markers). Student and marker performances are then subject to end-of-session board of examiner handling and analysis. This occurs annually and is the basis for evaluating students but also the wider learning and teaching efficiency of an academic institution.…

Descriptors: Undergraduate Students, Evaluation Methods, Evaluation Criteria, Academic Standards

What Makes Children's Responses to Creativity Assessments Difficult to Judge Reliably?

Peer reviewed
PDF on ERIC

Download full text

Direct link

Denis Dumas; Selcuk Acar; Kelly Berthiaume; Peter Organisciak; David Eby; Katalin Grajzel; Theadora Vlaamster; Michele Newman; Melanie Carrera – Grantee Submission, 2023

Open-ended verbal creativity assessments are commonly administered in psychological research and in educational practice to elementary-aged children. Children's responses are then typically rated by teams of judges who are trained to identify original ideas, hopefully with a degree of inter-rater agreement. Even in cases where the judges are…

Descriptors: Elementary School Students, Grade 3, Grade 4, Grade 5

What You Don't Know about Measurement Error--And Why You Should Care

Direct link

Lichtenstein, Robert – Communique, 2020

Appropriate interpretation of assessment data requires an appreciation that tools are subject to measurement error. School psychologists recognize, at least on an intellectual level, that measures are imperfect--that test scores and other quantitative measures (e.g., rating scales, systematic behavioral observations) are best estimates of…

Descriptors: Error of Measurement, Test Reliability, Pretests Posttests, Standardized Tests

Exploring Rating Quality in the Context of High-Stakes Rater-Mediated Educational Assessments

Direct link

Wenjing Guo – ProQuest LLC, 2021

Constructed response (CR) items are widely used in large-scale testing programs, including the National Assessment of Educational Progress (NAEP) and many district and state-level assessments in the United States. One unique feature of CR items is that they depend on human raters to assess the quality of examinees' work. The judgment of human…

Descriptors: National Competency Tests, Responses, Interrater Reliability, Error of Measurement

Online Administration of the Test of Narrative Language--Second Edition: Psychometrics and Considerations for Remote Assessment

Peer reviewed
PDF on ERIC

Download full text

Direct link

Beula M. Magimairaj; Philip Capin; Sandra L. Gillam; Sharon Vaughn; Greg Roberts; Anna-Maria Fall; Ronald B. Gillam – Grantee Submission, 2022

Purpose: Our aim was to evaluate the psychometric properties of the online administered format of the Test of Narrative Language--Second Edition (TNL-2; Gillam & Pearson, 2017), given the importance of assessing children's narrative ability and considerable absence of psychometric studies of spoken language assessments administered online.…

Descriptors: Computer Assisted Testing, Language Tests, Story Telling, Language Impairments

Online Administration of the Test of Narrative Language--Second Edition: Psychometrics and Considerations for Remote Assessment

Peer reviewed
PDF on ERIC

Download full text

Direct link

Beula M. Magimairaj; Philip Capin; Sandra L. Gillam; Sharon Vaughn; Greg Roberts; Anna-Maria Fall; Ronald B. Gillam – Language, Speech, and Hearing Services in Schools, 2022

Descriptors: Computer Assisted Testing, Language Tests, Story Telling, Language Impairments

The Exchangeability of Brief Intelligence Tests for Children with Intellectual Giftedness: Illuminating Error Variance Components' Influence on IQs

Peer reviewed

Direct link

Irby, Sarah M.; Floyd, Randy G. – Psychology in the Schools, 2017

This study examined the exchangeability of total scores (i.e., intelligent quotients [IQs]) from three brief intelligence tests. Tests were administered to 36 children with intellectual giftedness, scored live by one set of primary examiners and later scored by a secondary examiner. For each student, six IQs were calculated, and all 216 values…

Descriptors: Intelligence Tests, Gifted, Error of Measurement, Scores

The Miscalculation of Interrater Reliability: A Case Study Involving the AAC&U VALUE Rubrics

Peer reviewed
PDF on ERIC

Download full text

Szafran, Robert F. – Practical Assessment, Research & Evaluation, 2017

Institutional assessment of student learning objectives has become a fact-of-life in American higher education and the Association of American Colleges and Universities' (AAC&U) VALUE Rubrics have become a widely adopted evaluation and scoring tool for student work. As faculty from a variety of disciplines, some less familiar with the…

Descriptors: Interrater Reliability, Case Studies, Scoring Rubrics, Behavioral Objectives

Updated Technical Manual for the IDEA Feedback System for Administrators. IDEA Technical Report No. 20

Download full text

Benton, Stephen L.; Li, Dan – IDEA Center, Inc., 2018

This technical report describes the results of analyses performed on data collected from 2013 to 2017, using the IDEA Feedback System for Administrators (FSA). The FSA is used to gather impressions from core constituents about an administrator's performance of relevant administrative roles, as well as her/his leadership style, interpersonal…

Descriptors: Feedback (Response), Administrators, Administrator Attitudes, Administrator Role

Inter-Rater and Test-Retest (Between-Sessions) Reliability of the 4-Skills Scan for Dutch Elementary School Children

Peer reviewed

Direct link

van Kernebeek, Willem G.; de Schipper, Antoine W.; Savelsbergh, Geert J. P.; Toussaint, Huub M. – Measurement in Physical Education and Exercise Science, 2018

In The Netherlands, the 4-Skills Scan is an instrument for physical education teachers to assess gross motor skills of elementary school children. Little is known about its reliability. Therefore, in this study the test-retest and inter-rater reliability was determined. Respectively, 624 and 557 Dutch 6- to 12-year-old children were analyzed for…

Descriptors: Foreign Countries, Interrater Reliability, Pretests Posttests, Psychomotor Skills

Processes and Procedures for Estimating Score Reliability and Precision

Peer reviewed

Direct link

Bardhoshi, Gerta; Erford, Bradley T. – Measurement and Evaluation in Counseling and Development, 2017

Precision is a key facet of test development, with score reliability determined primarily according to the types of error one wants to approximate and demonstrate. This article identifies and discusses several primary forms of reliability estimation: internal consistency (i.e., split-half, KR-20, a), test-retest, alternate forms, interscorer, and…

Descriptors: Scores, Test Reliability, Accuracy, Pretests Posttests

Investigation of Coefficient of Individual Agreement in Terms of Sample Size, Random and Monotone Missing Ratio, and Number of Repeated Measures

Peer reviewed
PDF on ERIC

Download full text

Temel, Gülhan Orekici; Erdogan, Semra; Selvi, Hüseyin; Kaya, Irem Ersöz – Educational Sciences: Theory and Practice, 2016

Studies based on longitudinal data focus on the change and development of the situation being investigated and allow for examining cases regarding education, individual development, cultural change, and socioeconomic improvement in time. However, as these studies require taking repeated measures in different time periods, they may include various…

Descriptors: Investigations, Sample Size, Longitudinal Studies, Interrater Reliability

Previous Page | Next Page »

Pages: 1 | 2

Grantee Submission	2
IDEA Center, Inc.	2
New Mexico Public Education…	2
Advances in Physiology…	1
Alberta Journal of…	1
Communique	1
Educational Sciences: Theory…	1
Gifted Child Today	1
International Journal of…	1
International Journal of…	1
Language Learning	1
Language, Speech, and Hearing…	1
Measurement and Evaluation in…	1
Measurement in Physical…	1
Practical Assessment,…	1
ProQuest LLC	1
Psychology in the Schools	1
Research & Practice in…	1
Research Papers in Education	1
More ▼

Anna-Maria Fall	2
Benton, Stephen L.	2
Beula M. Magimairaj	2
Greg Roberts	2
Philip Capin	2
Ronald B. Gillam	2
Sandra L. Gillam	2
Sharon Vaughn	2
Aksu, Gökhan	1
Amy E. Ramage	1
Bardhoshi, Gerta	1
Bridgeman, Brent	1
Cantor, Nancy K.	1
Chamberlain, Suzanne	1
David Eby	1
Denis Dumas	1
Emily F. Dillon	1
Erdogan, Semra	1
Erford, Bradley T.	1
Eser, Mehmet Taha	1
Floyd, Randy G.	1
Gierl, Mark J.	1
Griph, Gerald W.	1
Gross, Amy B.	1
More ▼