ERIC - Search Results

Publication Date

In 2025	1
Since 2024	2
Since 2021 (last 5 years)	3
Since 2016 (last 10 years)	9
Since 2006 (last 20 years)	20

Descriptor

Interrater Reliability	31
Test Reliability	31
Test Validity	16
Evaluation Methods	9
Test Construction	9
Test Items	7
Scoring	6
Student Evaluation	6
Error of Measurement	5
Item Response Theory	5
Accuracy	4
Computation	4
Evaluators	4
Foreign Countries	4
Language Tests	4
Psychometrics	4
Scores	4
Comparative Analysis	3
Construct Validity	3
Cultural Differences	3
Cutting Scores	3
Educational Assessment	3
Educational Quality	3
Measurement Techniques	3
Measures (Individuals)	3
More ▼

Publication Type

Reports - Descriptive	31
Journal Articles	22
Numerical/Quantitative Data	3
Guides - Non-Classroom	2
Tests/Questionnaires	2
Reports - Evaluative	1
Reports - Research	1

Education Level

Postsecondary Education	5
Elementary Secondary Education	3
Higher Education	3
Early Childhood Education	2
Elementary Education	2
Grade 1	2
Grade 3	2
Primary Education	2
Adult Education	1
Grade 2	1

Audience

Counselors	1
Researchers	1

Location

New Mexico	2
Australia	1
Florida	1
Georgia	1
Ireland (Dublin)	1
Japan	1
Tennessee	1
United Kingdom (England)	1

Laws, Policies, & Programs

Assessments and Surveys

Test of English as a Foreign…	2
Autism Diagnostic Observation…	1
Conners Teacher Rating Scale	1
Stanford Binet Intelligence…	1
Test of English for…	1
Wechsler Intelligence Scale…	1

What Works Clearinghouse Rating

Showing 1 to 15 of 31 results Save | Export

Technical Adequacy-Reliability

Peer reviewed

Direct link

Susan K. Johnsen – Gifted Child Today, 2025

The author provides information about reliability and areas that educators should examine in determining if an assessment is consistent and trustworthy for use, and how it should be interpreted in making decisions about students. Reliability areas that are discussed in the column include internal consistency, test-retest or stability, inter-scorer…

Descriptors: Test Reliability, Academically Gifted, Student Evaluation, Error of Measurement

Defining in Detail and Evaluating Reliability of DSM-5 Criteria for Autism Spectrum Disorder (ASD) among Children

Peer reviewed

Direct link

Rice, C. E.; Carpenter, L. A.; Morrier, M. J.; Lord, C.; DiRienzo, M.; Boan, A.; Skowyra, C.; Fusco, A.; Baio, J.; Esler, A.; Zahorodny, W.; Hobson, N.; Mars, A.; Thurm, A.; Bishop, S.; Wiggins, L. D. – Journal of Autism and Developmental Disorders, 2022

This paper describes a process to define a comprehensive list of exemplars for seven core Diagnostic and Statistical Manual (DSM) diagnostic criteria for autism spectrum disorder (ASD), and report on interrater reliability in applying these exemplars to determine ASD case classification. Clinicians completed an iterative process to map specific…

Descriptors: Autism Spectrum Disorders, Clinical Diagnosis, Test Reliability, Interrater Reliability

The Value of Expanding Perspectives on Assessment

Peer reviewed

Direct link

Janice Kinghorn; Katherine McGuire; Bethany L. Miller; Aaron Zimmerman – Assessment Update, 2024

In this article, the authors share their reflections on how different experiences and paradigms have broadened their understanding of the work of assessment in higher education. As they collaborated to create a panel for the 2024 International Conference on Assessing Quality in Higher Education, they recognized that they, as assessment…

Descriptors: Higher Education, Assessment Literacy, Evaluation Criteria, Evaluation Methods

What You Don't Know about Measurement Error--And Why You Should Care

Direct link

Lichtenstein, Robert – Communique, 2020

Appropriate interpretation of assessment data requires an appreciation that tools are subject to measurement error. School psychologists recognize, at least on an intellectual level, that measures are imperfect--that test scores and other quantitative measures (e.g., rating scales, systematic behavioral observations) are best estimates of…

Descriptors: Error of Measurement, Test Reliability, Pretests Posttests, Standardized Tests

The Reliability and Consequential Validity of Two Teacher-Administered Student Mathematics Diagnostic Assessments. Study Snapshot. REL 2020-039

Peer reviewed
PDF on ERIC

Download full text

Regional Educational Laboratory Southeast, 2020

Teachers need to assess their students' current level of mathematical understanding to provide appropriate interventions for students who are struggling. Several school districts in Georgia currently use two assessments for this purpose--the Global Strategy Stage (GloSS) and the Individual Knowledge Assessment of Number (IKAN). The IKAN is…

Descriptors: Mathematics Tests, Diagnostic Tests, Test Reliability, Test Validity

Development and Validation of a Survey Instrument for Measuring Pre-Service Teachers' Pedagogical Content Knowledge

Peer reviewed

Direct link

Martin, David; Jamieson-Proctor, Romina – International Journal of Research & Method in Education, 2020

In Australia, one of the key findings of the Teacher Education Ministerial Advisory Group was that not all graduating pre-service teachers possess adequate pedagogical content knowledge (PCK) to teach effectively. The concern is that higher education providers working with pre-service teachers are using pedagogical practices and assessments which…

Descriptors: Test Construction, Preservice Teachers, Pedagogical Content Knowledge, Foreign Countries

Processes and Procedures for Estimating Score Reliability and Precision

Peer reviewed

Direct link

Bardhoshi, Gerta; Erford, Bradley T. – Measurement and Evaluation in Counseling and Development, 2017

Precision is a key facet of test development, with score reliability determined primarily according to the types of error one wants to approximate and demonstrate. This article identifies and discusses several primary forms of reliability estimation: internal consistency (i.e., split-half, KR-20, a), test-retest, alternate forms, interscorer, and…

Descriptors: Scores, Test Reliability, Accuracy, Pretests Posttests

Measuring Program Quality, Part 2: Addressing Potential Cultural Bias in a Rater Reliability Exam

Peer reviewed
PDF on ERIC

Download full text

Richer, Amanda; Charmaraman, Linda; Ceder, Ineke – Afterschool Matters, 2018

Like instruments used in afterschool programs to assess children's social and emotional growth or to evaluate staff members' performance, instruments used to evaluate program quality should be free from bias. Practitioners and researchers alike want to know that assessment instruments, whatever their type or intent, treat all people fairly and do…

Descriptors: Cultural Differences, Social Bias, Interrater Reliability, Program Evaluation

Item Response Theory for Peer Assessment

Peer reviewed

Direct link

Uto, Masaki; Ueno, Maomi – IEEE Transactions on Learning Technologies, 2016

As an assessment method based on a constructivist approach, peer assessment has become popular in recent years. However, in peer assessment, a problem remains that reliability depends on the rater characteristics. For this reason, some item response models that incorporate rater parameters have been proposed. Those models are expected to improve…

Descriptors: Item Response Theory, Peer Evaluation, Bayesian Statistics, Simulation

Response to "Rating Teachers Cheaper, Faster, and Better: Not so Fast": It's About Evidence

Peer reviewed

Direct link

Gargani, John; Strong, Michael – Journal of Teacher Education, 2015

In Gargani and Strong (2014), we describe The Rapid Assessment of Teacher Effectiveness (RATE), a new teacher evaluation instrument. Our account of the validation research associated with RATE inspired a review by Good and Lavigne (2015). Here, we reply to the main points of their review. We elaborate on the validity, reliability, theoretical…

Descriptors: Evidence, Teacher Effectiveness, Teacher Evaluation, Evaluation Methods

Test Theories, Educational Priorities and Reliability of Public Examinations in England

Peer reviewed

Direct link

Baird, Jo-Anne; Black, Paul – Research Papers in Education, 2013

Much has already been written on the controversies surrounding the use of different test theories in educational assessment. Other authors have noted the prevalence of classical test theory over item response theory in practice. This Special Issue draws together articles based upon work conducted on the Reliability Programme for England's…

Descriptors: Test Theory, Foreign Countries, Test Reliability, Item Response Theory

Brief Report: The Autism Mental Status Examination--Development of a Brief Autism-Focused Exam

Peer reviewed

Direct link

Grodberg, David; Weinger, Paige M.; Kolevzon, Alexander; Soorya, Latha; Buxbaum, Joseph D. – Journal of Autism and Developmental Disorders, 2012

The Autism Mental Status Examination (AMSE) described here is an eight-item observational assessment that prompts the observation and recording of signs and symptoms of autism spectrum disorders (ASD). The AMSE is intended to take place seamlessly in the context of a clinical exam and produces a total score. Subjects were independently…

Descriptors: Observation, Autism, Interrater Reliability, At Risk Persons

Marking as Judgment

Peer reviewed

Direct link

Brooks, Val – Research Papers in Education, 2012

An aspect of assessment which has received little attention compared with perennial concerns, such as standards or reliability, is the role of judgment in marking. This paper explores marking as an act of judgment, paying particular attention to the nature of judgment and the processes involved. It brings together studies which have explored…

Descriptors: Educational Assessment, Test Reliability, Test Validity, Value Judgment

Standardising Assessment to Meet Student Needs in Foreign Language Modules in a University Context: Is Standardisation Possible?

Peer reviewed

Direct link

Nunan, Anna – Language Learning in Higher Education, 2014

The Applied Language Centre at University College Dublin offers foreign language modules to students in ten languages at CEFR [Common European Framework of Reference for Languages] levels ranging from A1 to B2. Efforts have been underway in the Centre to standardise the assessment components across languages to ensure parity between module credits…

Descriptors: Second Language Learning, Second Language Instruction, College Students, Standards

Use of e-rater[R] in Scoring of the TOEFL iBT[R] Writing Test. Research Report. ETS RR-11-25

Download full text

Haberman, Shelby J. – Educational Testing Service, 2011

Alternative approaches are discussed for use of e-rater[R] to score the TOEFL iBT[R] Writing test. These approaches involve alternate criteria. In the 1st approach, the predicted variable is the expected rater score of the examinee's 2 essays. In the 2nd approach, the predicted variable is the expected rater score of 2 essay responses by the…

Descriptors: Writing Tests, Scoring, Essays, Language Tests

Previous Page | Next Page »

Pages: 1 | 2 | 3

Journal of Autism and…	2
New Mexico Public Education…	2
Research Papers in Education	2
Afterschool Matters	1
Assessment Update	1
Autism: The International…	1
Center for Research on…	1
Communique	1
Early Childhood Education…	1
Education Digest: Essential…	1
Educational Testing Service	1
Educational and Psychological…	1
Gifted Child Today	1
IEEE Transactions on Learning…	1
International Journal of…	1
Journal of Continuing…	1
Journal of Deaf Studies and…	1
Journal of Research on…	1
Journal of Teacher Education	1
Language Learning in Higher…	1
Measurement and Evaluation in…	1
Multivariate Behavioral…	1
Regional Educational…	1
Regional Educational…	1
Thought Currents in English…	1
More ▼

Aaron Zimmerman	1
Abedi, Jamal	1
Albertini, John	1
Baio, J.	1
Baird, Jo-Anne	1
Bardhoshi, Gerta	1
Bethany L. Miller	1
Bishop, S.	1
Black, Paul	1
Boan, A.	1
Boller, Kimberly	1
Bradley, Robert H.	1
Brooks, Val	1
Buxbaum, Joseph D.	1
Caldwell, Betty M.	1
Carpenter, L. A.	1
Ceder, Ineke	1
Charmaraman, Linda	1
Cicchetti, Domenic V.	1
Corwyn, Robert F.	1
DiRienzo, M.	1
DiazGranados, Deborah	1
Endelman, Ann M.	1
Erford, Bradley T.	1
More ▼