NotesFAQContact Us
Collection
Advanced
Search Tips
Audience
Laws, Policies, & Programs
Assessments and Surveys
Test of English as a Foreign…1
What Works Clearinghouse Rating
Showing all 15 results Save | Export
Peer reviewed Peer reviewed
PDF on ERIC Download full text
Tack, Anaïs; Piech, Chris – International Educational Data Mining Society, 2022
How can we test whether state-of-the-art generative models, such as Blender and GPT-3, are good AI teachers, capable of replying to a student in an educational dialogue? Designing an AI teacher test is challenging: although evaluation methods are much-needed, there is no off-the-shelf solution to measuring pedagogical ability. This paper reports…
Descriptors: Artificial Intelligence, Dialogs (Language), Bayesian Statistics, Decision Making
Peer reviewed Peer reviewed
Direct linkDirect link
Jones, Ian; Inglis, Matthew – Educational Studies in Mathematics, 2015
School mathematics examination papers are typically dominated by short, structured items that fail to assess sustained reasoning or problem solving. A contributory factor to this situation is the need for student work to be marked reliably by a large number of markers of varied experience and competence. We report a study that tested an…
Descriptors: Problem Solving, Mathematics Instruction, Mathematics Tests, Test Items
Peer reviewed Peer reviewed
Direct linkDirect link
Nordness, Philip D.; Epstein, Michael H.; Cullinan, Douglas; Pierce, Corey D. – Remedial and Special Education, 2014
The Emotional and Behavioral Screener (EBS) is a universal screening instrument designed to identify students whose excessive problem behaviors put them at risk of the education disability category of emotional disturbance (ED). This article reports findings from three studies that address the reliability and validity of the EBS. Studies 1 and 2…
Descriptors: Screening Tests, Disability Identification, Behavior Problems, At Risk Students
Peer reviewed Peer reviewed
Direct linkDirect link
Marrus, Natasha; Faughn, Carley; Shuman, Jeremy; Petersen, Steve E.; Constantino, John N.; Povinelli, Daniel J.; Pruett, John R., Jr. – Journal of the American Academy of Child & Adolescent Psychiatry, 2011
Objective: Comparative studies of social responsiveness, an ability that is impaired in autism spectrum disorders, can inform our understanding of both autism and the cognitive architecture of social behavior. Because there is no existing quantitative measure of social responsiveness in chimpanzees, we generated a quantitative, cross-species…
Descriptors: Animals, Social Behavior, Interrater Reliability, Measures (Individuals)
Peer reviewed Peer reviewed
Direct linkDirect link
Chavez, Oscar; Papick, Ira; Ross, Daniel J.; Grouws, Douglas A. – Mathematics Education Research Journal, 2011
This article describes the process of development of assessment instruments for a three-year longitudinal comparative study that focused on evaluating American high school students' mathematics learning from two distinct approaches to content organization: curriculum built around a sequence of three full-year courses (Algebra 1, Geometry, and…
Descriptors: Mathematics Curriculum, Mathematics Education, Interrater Reliability, Scoring Rubrics
Peer reviewed Peer reviewed
Direct linkDirect link
Brady, Nancy C.; Fleming, Kandace; Thiemann-Bourque, Kathy; Olswang, Lesley; Dowden, Patricia; Saunders, Muriel D.; Marquis, Janet – American Journal of Speech-Language Pathology, 2012
Purpose: Accurate description of an individual's communication status is critical in both research and practice. Describing the communication status of individuals with severe intellectual and developmental disabilities is difficult because these individuals often communicate with presymbolic means that may not be readily recognized. Our goal was…
Descriptors: Mental Retardation, Standardized Tests, Developmental Disabilities, Interrater Reliability
Sawchuk, Stephen – Education Digest: Essential Readings Condensed for Quick Review, 2010
Most experts in the testing community have presumed that the $350 million promised by the U.S. Department of Education to support common assessments would promote those that made greater use of open-ended items capable of measuring higher-order critical-thinking skills. But as measurement experts consider the multitude of possibilities for an…
Descriptors: Educational Quality, Test Items, Comparative Analysis, Multiple Choice Tests
Chavez, Oscar; Papick, Ira; Ross, Dan J.; Grouws, Douglas A. – Online Submission, 2010
The purpose of this paper was to describe the process of development of assessment instruments for the Comparing Options in Secondary Mathematics: Investigating Curriculum (COSMIC) project. The COSMIC project was a three-year longitudinal comparative study focusing on evaluating high school students' mathematics learning from two distinct…
Descriptors: Mathematics Education, Mathematics Achievement, Interrater Reliability, Scoring Rubrics
Peer reviewed Peer reviewed
Direct linkDirect link
Murdock, Linda C.; Cost, Hollie C.; Tieso, Carol – Focus on Autism and Other Developmental Disabilities, 2007
The "Social-Communication Assessment Tool" (S-CAT) was created as a direct observation instrument to quantify specific social and communication deficits of children with autism spectrum disorders (ASD) within educational settings. In this pilot study, the instrument's content validity and interrater reliability were investigated to determine the…
Descriptors: Nonverbal Communication, Autism, Content Validity, Test Validity
Peer reviewed Peer reviewed
Alsawalmeh, Yousef M.; Feldt, Leonard S. – Psychometrika, 1994
A modification of a test of the equality of nonindependent alpha reliability coefficients is proposed. It avoids the limitation that the product of the number of test parts times the number of subjects be quite large. Monte Carlo studies indicate that this test can be used in comparing interrater reliabilities. (SLD)
Descriptors: Comparative Analysis, Computer Simulation, Equations (Mathematics), Interrater Reliability
Gearhart, Maryl – 1994
The "Writing What You Read" (WWYR) rubric was designed for large-scale assessments, and differs from most narrative rubrics in its narrative-specific content and its developmental framework. The rubric contains five analytic subscales for theme, character, setting, plot, and communication, and a sixth holistic scale for overall…
Descriptors: Comparative Analysis, Educational Assessment, Elementary Education, Holistic Approach
Henning, Grant; And Others – 1995
A prototype revised form of the Test of Spoken English (TSE) was compared with the current version of the same test, comparing interrater reliability, frequency of rater discrepancy at all score levels, component task adequacy, scoring efficacy, and other concurrent and construct validity evidence, including the oral proficiency interview…
Descriptors: Adults, College Students, Comparative Analysis, English (Second Language)
Peer reviewed Peer reviewed
Wigglesworth, Gillian – Language Testing, 1997
In this study, planning time was manipulated as a variable in a trial administration of a semi-direct oral interaction test. Discourse analytic techniques were used to determine the nature and/or significance of difference in the elicited discourse across two conditions in terms of complexity and accuracy. Findings suggest that planning time may…
Descriptors: Cognitive Development, Communicative Competence (Languages), Comparative Analysis, Discourse Analysis
Silvestro, John R.; And Others – 1989
The job analysis procedures used in the development of the Illinois Certification Testing System are described. The degree of congruence between job analysis ratings provided by public school educators (PSEs) and teacher educators (TEs) who completed the job analysis surveys is examined. National Evaluation Systems, Inc., and the Illinois State…
Descriptors: Comparative Analysis, Content Analysis, Elementary Secondary Education, Interrater Reliability
Ferroli, Lou; Krajenta, Marilyn – 1993
The creation and validation of a Spanish version of an English developmental spelling test (DST) is described. An introductory section reviews related literature on the rationale for and construction of DSTs, spelling development in the early grades, and Spanish-English bilingual education. Differences between the English and Spanish test versions…
Descriptors: Comparative Analysis, Elementary School Students, English, Grade 1