ERIC - Search Results

Publication Date

In 2025	0
Since 2024	0
Since 2021 (last 5 years)	1
Since 2016 (last 10 years)	1
Since 2006 (last 20 years)	9

Descriptor

Comparative Analysis	15
Interrater Reliability	15
Test Construction	15
Test Reliability	5
Test Validity	5
Scoring	4
Autism	3
Content Analysis	3
Language Tests	3
Mathematics Education	3
Scoring Rubrics	3
Student Evaluation	3
Test Items	3
Adults	2
Algebra	2
Children	2
Communication Skills	2
Computer Simulation	2
Data Analysis	2
Elementary School Students	2
Factor Analysis	2
Federal Aid	2
Geometry	2
High School Students	2
Mathematics Curriculum	2
More ▼

Source

American Journal of…	1
Education Digest: Essential…	1
Educational Studies in…	1
Focus on Autism and Other…	1
International Educational…	1
Journal of the American…	1
Language Testing	1
Mathematics Education…	1
Online Submission	1
Psychometrika	1
Remedial and Special Education	1
More ▼

Publication Type

Journal Articles	9
Reports - Research	9
Reports - Evaluative	5
Speeches/Meeting Papers	4
Reports - Descriptive	1
Tests/Questionnaires	1

Education Level

Secondary Education	3
High Schools	2
Elementary Education	1
Elementary Secondary Education	1
Junior High Schools	1
Middle Schools	1

Audience

Location

Florida	1
Kansas	1
Tennessee	1
Washington	1

Laws, Policies, & Programs

Assessments and Surveys

Test of English as a Foreign…

What Works Clearinghouse Rating

Showing all 15 results Save | Export

The AI Teacher Test: Measuring the Pedagogical Ability of Blender and GPT-3 in Educational Dialogues

Peer reviewed
PDF on ERIC

Download full text

Tack, Anaïs; Piech, Chris – International Educational Data Mining Society, 2022

How can we test whether state-of-the-art generative models, such as Blender and GPT-3, are good AI teachers, capable of replying to a student in an educational dialogue? Designing an AI teacher test is challenging: although evaluation methods are much-needed, there is no off-the-shelf solution to measuring pedagogical ability. This paper reports…

Descriptors: Artificial Intelligence, Dialogs (Language), Bayesian Statistics, Decision Making

The Problem of Assessing Problem Solving: Can Comparative Judgement Help?

Peer reviewed

Direct link

Jones, Ian; Inglis, Matthew – Educational Studies in Mathematics, 2015

School mathematics examination papers are typically dominated by short, structured items that fail to assess sustained reasoning or problem solving. A contributory factor to this situation is the need for student work to be marked reliably by a large number of markers of varied experience and competence. We report a study that tested an…

Descriptors: Problem Solving, Mathematics Instruction, Mathematics Tests, Test Items

Emotional and Behavioral Screener: Test-Retest Reliability, Inter-Rater Reliability, and Convergent Validity

Peer reviewed

Direct link

Nordness, Philip D.; Epstein, Michael H.; Cullinan, Douglas; Pierce, Corey D. – Remedial and Special Education, 2014

The Emotional and Behavioral Screener (EBS) is a universal screening instrument designed to identify students whose excessive problem behaviors put them at risk of the education disability category of emotional disturbance (ED). This article reports findings from three studies that address the reliability and validity of the EBS. Studies 1 and 2…

Descriptors: Screening Tests, Disability Identification, Behavior Problems, At Risk Students

Initial Description of a Quantitative, Cross-Species (Chimpanzee-Human) Social Responsiveness Measure

Peer reviewed

Direct link

Marrus, Natasha; Faughn, Carley; Shuman, Jeremy; Petersen, Steve E.; Constantino, John N.; Povinelli, Daniel J.; Pruett, John R., Jr. – Journal of the American Academy of Child & Adolescent Psychiatry, 2011

Objective: Comparative studies of social responsiveness, an ability that is impaired in autism spectrum disorders, can inform our understanding of both autism and the cognitive architecture of social behavior. Because there is no existing quantitative measure of social responsiveness in chimpanzees, we generated a quantitative, cross-species…

Descriptors: Animals, Social Behavior, Interrater Reliability, Measures (Individuals)

Developing Fair Tests for Mathematics Curriculum Comparison Studies: The Role of Content Analyses

Peer reviewed

Direct link

Chavez, Oscar; Papick, Ira; Ross, Daniel J.; Grouws, Douglas A. – Mathematics Education Research Journal, 2011

This article describes the process of development of assessment instruments for a three-year longitudinal comparative study that focused on evaluating American high school students' mathematics learning from two distinct approaches to content organization: curriculum built around a sequence of three full-year courses (Algebra 1, Geometry, and…

Descriptors: Mathematics Curriculum, Mathematics Education, Interrater Reliability, Scoring Rubrics

Development of the Communication Complexity Scale

Peer reviewed

Direct link

Brady, Nancy C.; Fleming, Kandace; Thiemann-Bourque, Kathy; Olswang, Lesley; Dowden, Patricia; Saunders, Muriel D.; Marquis, Janet – American Journal of Speech-Language Pathology, 2012

Purpose: Accurate description of an individual's communication status is critical in both research and practice. Describing the communication status of individuals with severe intellectual and developmental disabilities is difficult because these individuals often communicate with presymbolic means that may not be readily recognized. Our goal was…

Descriptors: Mental Retardation, Standardized Tests, Developmental Disabilities, Interrater Reliability

Quality of Questions on Common Tests at Issue

Direct link

Sawchuk, Stephen – Education Digest: Essential Readings Condensed for Quick Review, 2010

Most experts in the testing community have presumed that the $350 million promised by the U.S. Department of Education to support common assessments would promote those that made greater use of open-ended items capable of measuring higher-order critical-thinking skills. But as measurement experts consider the multitude of possibilities for an…

Descriptors: Educational Quality, Test Items, Comparative Analysis, Multiple Choice Tests

The Essential Role of Curricular Analyses in Comparative Studies of Mathematics Achievement: Developing "Fair" Tests

Download full text

Chavez, Oscar; Papick, Ira; Ross, Dan J.; Grouws, Douglas A. – Online Submission, 2010

The purpose of this paper was to describe the process of development of assessment instruments for the Comparing Options in Secondary Mathematics: Investigating Curriculum (COSMIC) project. The COSMIC project was a three-year longitudinal comparative study focusing on evaluating high school students' mathematics learning from two distinct…

Descriptors: Mathematics Education, Mathematics Achievement, Interrater Reliability, Scoring Rubrics

Measurement of Social Communication Skills of Children with Autism Spectrum Disorders during Interactions with Typical Peers

Peer reviewed

Direct link

Murdock, Linda C.; Cost, Hollie C.; Tieso, Carol – Focus on Autism and Other Developmental Disabilities, 2007

The "Social-Communication Assessment Tool" (S-CAT) was created as a direct observation instrument to quantify specific social and communication deficits of children with autism spectrum disorders (ASD) within educational settings. In this pilot study, the instrument's content validity and interrater reliability were investigated to determine the…

Descriptors: Nonverbal Communication, Autism, Content Validity, Test Validity

A Modification of Feldt's Test of the Equality of Two Dependent Alpha Coefficients.

Peer reviewed

Alsawalmeh, Yousef M.; Feldt, Leonard S. – Psychometrika, 1994

A modification of a test of the equality of nonindependent alpha reliability coefficients is proposed. It avoids the limitation that the product of the number of test parts times the number of subjects be quite large. Monte Carlo studies indicate that this test can be used in comparing interrater reliabilities. (SLD)

Descriptors: Comparative Analysis, Computer Simulation, Equations (Mathematics), Interrater Reliability

Toward the Instructional Utility of Large-Scale Writing Assessment: Validation of a New Narrative Rubric. Project 3.1. Studies in Improving Classroom and Local Assessments. Portfolio Assessment: Reliability of Teachers' Judgments.

Download full text

Gearhart, Maryl – 1994

The "Writing What You Read" (WWYR) rubric was designed for large-scale assessments, and differs from most narrative rubrics in its narrative-specific content and its developmental framework. The rubric contains five analytic subscales for theme, character, setting, plot, and communication, and a sixth holistic scale for overall…

Descriptors: Comparative Analysis, Educational Assessment, Elementary Education, Holistic Approach

Analysis of Proposed Revisions of the Test of Spoken English. TOEFL Research Reports 48.

Download full text

Henning, Grant; And Others – 1995

A prototype revised form of the Test of Spoken English (TSE) was compared with the current version of the same test, comparing interrater reliability, frequency of rater discrepancy at all score levels, component task adequacy, scoring efficacy, and other concurrent and construct validity evidence, including the oral proficiency interview…

Descriptors: Adults, College Students, Comparative Analysis, English (Second Language)

An Investigation of Planning Time and Proficiency Level on Oral Test Discourse.

Peer reviewed

Wigglesworth, Gillian – Language Testing, 1997

In this study, planning time was manipulated as a variable in a trial administration of a semi-direct oral interaction test. Discourse analytic techniques were used to determine the nature and/or significance of difference in the elicited discourse across two conditions in terms of complexity and accuracy. Findings suggest that planning time may…

Descriptors: Cognitive Development, Communicative Competence (Languages), Comparative Analysis, Discourse Analysis

Public School Educator and Teacher Educator Job Analysis Ratings of Certification Test Objectives.

Silvestro, John R.; And Others – 1989

The job analysis procedures used in the development of the Illinois Certification Testing System are described. The degree of congruence between job analysis ratings provided by public school educators (PSEs) and teacher educators (TEs) who completed the job analysis surveys is examined. National Evaluation Systems, Inc., and the Illinois State…

Descriptors: Comparative Analysis, Content Analysis, Elementary Secondary Education, Interrater Reliability

Validating a Spanish Developmental Spelling Test.

Download full text

Ferroli, Lou; Krajenta, Marilyn – 1993

The creation and validation of a Spanish version of an English developmental spelling test (DST) is described. An introductory section reviews related literature on the rationale for and construction of DSTs, spelling development in the early grades, and Spanish-English bilingual education. Differences between the English and Spanish test versions…

Descriptors: Comparative Analysis, Elementary School Students, English, Grade 1

Chavez, Oscar	2
Grouws, Douglas A.	2
Papick, Ira	2
Alsawalmeh, Yousef M.	1
Brady, Nancy C.	1
Constantino, John N.	1
Cost, Hollie C.	1
Cullinan, Douglas	1
Dowden, Patricia	1
Epstein, Michael H.	1
Faughn, Carley	1
Feldt, Leonard S.	1
Ferroli, Lou	1
Fleming, Kandace	1
Gearhart, Maryl	1
Henning, Grant	1
Inglis, Matthew	1
Jones, Ian	1
Krajenta, Marilyn	1
Marquis, Janet	1
Marrus, Natasha	1
Murdock, Linda C.	1
Nordness, Philip D.	1
Olswang, Lesley	1
More ▼