ERIC - Search Results

Publication Date

In 2025	0
Since 2024	0
Since 2021 (last 5 years)	1
Since 2016 (last 10 years)	1
Since 2006 (last 20 years)	2

Descriptor

Comparative Analysis	35
Interrater Reliability	35
Evaluators	10
Higher Education	8
Evaluation Methods	7
Language Tests	7
Scoring	7
Test Items	7
Testing	7
English (Second Language)	6
Evaluation Criteria	6
Judges	6
Elementary Secondary Education	5
Ability	4
Models	4
Scores	4
Test Construction	4
Test Reliability	4
Writing Evaluation	4
Content Analysis	3
Difficulty Level	3
Educational Assessment	3
Foreign Countries	3
High School Students	3
High Schools	3
More ▼

Source

International Educational…	1
Journal of Communication…	1
Online Submission	1

Publication Type

Speeches/Meeting Papers	35
Reports - Research	21
Reports - Evaluative	11
Information Analyses	2
Tests/Questionnaires	2
Collected Works - Serials	1
Opinion Papers	1

Education Level

Elementary Secondary Education	1
High Schools	1
Secondary Education	1

Audience

Researchers	3
Practitioners	2
Teachers	1

Location

California

Laws, Policies, & Programs

Assessments and Surveys

Graduate Management Admission…	1
National Assessment of…	1
Test of English as a Foreign…	1

What Works Clearinghouse Rating

Showing 1 to 15 of 35 results Save | Export

The AI Teacher Test: Measuring the Pedagogical Ability of Blender and GPT-3 in Educational Dialogues

Peer reviewed
PDF on ERIC

Download full text

Tack, Anaïs; Piech, Chris – International Educational Data Mining Society, 2022

How can we test whether state-of-the-art generative models, such as Blender and GPT-3, are good AI teachers, capable of replying to a student in an educational dialogue? Designing an AI teacher test is challenging: although evaluation methods are much-needed, there is no off-the-shelf solution to measuring pedagogical ability. This paper reports…

Descriptors: Artificial Intelligence, Dialogs (Language), Bayesian Statistics, Decision Making

The Essential Role of Curricular Analyses in Comparative Studies of Mathematics Achievement: Developing "Fair" Tests

Download full text

Chavez, Oscar; Papick, Ira; Ross, Dan J.; Grouws, Douglas A. – Online Submission, 2010

The purpose of this paper was to describe the process of development of assessment instruments for the Comparing Options in Secondary Mathematics: Investigating Curriculum (COSMIC) project. The COSMIC project was a three-year longitudinal comparative study focusing on evaluating high school students' mathematics learning from two distinct…

Descriptors: Mathematics Education, Mathematics Achievement, Interrater Reliability, Scoring Rubrics

A Method To Compare Rater Severity across Several Administrations.

Download full text

O'Neill, Thomas R.; Lunz, Mary E. – 1997

This paper illustrates a method to study rater severity across exam administrations. A multi-facet Rasch model defined the ratings as being dominated by four facets: examinee ability, rater severity, project difficulty, and task difficulty. Ten years of data from administrations of a histotechnology performance assessment were pooled and analyzed…

Descriptors: Ability, Comparative Analysis, Equated Scores, Interrater Reliability

Of English Marks and American Reviewers.

Download full text

Spolsky, Bernard – 1990

A discussion of the differences between the Test of English as a Foreign Language (TOEFL), an American test battery, and the Cambridge English Examinations (Cambridge), a British battery, focuses on the different approaches to language test development embodied in the tests as the source of difficulty in translating between them for individual…

Descriptors: Comparative Analysis, Cultural Differences, English (Second Language), Foreign Countries

Nonparametric Test of Ordered Alternatives: Extension of Page's L Test for Two Groups of Unequal Size.

Download full text

Beasley, T. Mark; Leitner, Dennis W. – 1993

The L statistic of E. B. Page (1963) tests the agreement of a single group of judges with an a priori ordering of alternative treatments. This paper extends the two group test of D. W. Leitner and C. M. Dayton (1976), an extension of the L test, to analyze difference in consensus between two unequally sized groups of judges. Exact critical values…

Descriptors: Comparative Analysis, Equations (Mathematics), Estimation (Mathematics), Evaluators

Judges' Agreement and Disagreement Patterns When Encoding Verbal Protocols.

Schael, Jocelyne; Dionne, Jean-Paul – 1991

The basis of agreement or disagreement among judges/evaluators when applying a coding scheme to concurrent verbal protocols was studied. The sample included 20 university graduates, from varied backgrounds; 10 subjects had and 10 subjects did not have experience in protocol analysis. The total sample was divided into four balanced groups according…

Descriptors: Adults, College Graduates, Comparative Analysis, Encoding (Psychology)

The Teacher-Teacher Reliability of the CRI and ERI.

Christine, Charles T.; And Others – 1982

Thirty-two children aged 7 to 12 participated in a study to determine the reliability of the Ekwall Reading Inventory (ERI) and the Classroom Reading Inventory (CRI). The children were randomly assigned to take one of the two inventories, which were administered by four different specially trained teachers. The study used a test-retest design, in…

Descriptors: Comparative Analysis, Elementary Secondary Education, Informal Reading Inventories, Interrater Reliability

Debate Philosophy Statements as Predictors of Critic Attitudes: A Summary and Direction of Research.

Download full text

Dudczak, Craig; Day, Donald – 1991

Philosophy statements have been used in the National Debate Tournament (NDT) since the mid-1970s and the Cross Examination Debate Association (CEDA) National Tournament since its 1986 inception. The statements should help debaters adapt to critics' expressed preferences. Moreover, philosophy statements can guide the study of argumentation theory…

Descriptors: Comparative Analysis, Content Analysis, Debate, Higher Education

Metapragmatic Judgment on Refusals: Its Reliability and Consistency.

Download full text

Chen, H. Julie – 1995

A study investigated 42 native English-speakers' (NSs) perceptions of the pragmatic appropriateness of refusal statements. The NSs rated the appropriateness of 24 written statements in 4 different refusal scenarios, which were collected from both native speakers and non-native speakers. Four weeks later, as a reliability check, the subjects rated…

Descriptors: Attitudes, Comparative Analysis, English (Second Language), Interrater Reliability

The Professional and the Lay Judge: A Comparison of Competitive Rankings in Forensics Tournaments.

Download full text

Nicolai, Michael T. – 1987

To determine if there is a distinction between the forensics community's idea of quality and that of the general population, tournament rankings of forensics judges and those of a lay audience were compared. Undergraduate students enrolled in a variety of speech related courses were asked to attend rounds of competition at a midwest collegiate…

Descriptors: Communication Research, Comparative Analysis, Debate, Evaluation Criteria

Evaluating the Efficacy of Rater Self-Training.

Download full text

Kenyon, Dorry; Stansfield, Charles W. – 1993

This paper examines whether individuals who train themselves to score a performance assessment will rate acceptably when compared to known standards. Research on the efficacy of rater self-training materials developed by the Center for Applied Linguistics for the Texas Oral Proficiency Test (TOPT) is examined. Rater self-materials are described…

Descriptors: Bilingual Education, Comparative Analysis, Evaluators, Individual Characteristics

A Comparison of Effectiveness Ratings of Selected Principals and NASSP Assessment Center Ratings.

PDF pending restoration

Yates, Beverly J. – 1991

The predictive validity of the National Association of Secondary School Principals (NASSP) assessment center evaluation process for principals is compared with the perceived effectiveness of a selected population of principals. The NASSP assessment center approach includes a case study, a personal interview, two exercises, and a scholastic…

Descriptors: Administrator Evaluation, Assessment Centers (Personnel), Case Studies, Comparative Analysis

Examining the Invariance of Rater and Project Calibrations Using a Multi-facet Rasch Model.

Download full text

O'Neill, Thomas R.; Lunz, Mary E. – 1996

To generalize test results beyond the particular test administration, an examinee's ability estimate must be independent of the particular items attempted, and the item difficulty calibrations must be independent of the particular sample of people attempting the items. This stability is a key concept of the Rasch model, a latent trait model of…

Descriptors: Ability, Benchmarking, Comparative Analysis, Difficulty Level

Alternative Procedures for Integrating Multidimensional Evaluations of Schools: An Experimental Comparison.

PDF pending restoration

Jaeger, Richard M.; Usher, Claire H. – 1991

This paper reports on a study of the foundation and application of two procedures used to specify appropriate weights to be applied to components in determining the overall quality of a school. These procedures are multiattribute utility technology (MAUT) and policy capturing, and the paper presents the results of applying them, using key…

Descriptors: Achievement Tests, Comparative Analysis, Curriculum Evaluation, Educational Assessment

Analysis of Interrater Reliability on the Evaluation of Answers to Open-Ended Questions.

Crews, William E., Jr. – 1991

As part of a study of teacher evaluation of student replies to open-ended questions, a second question--the best method of determining interrater reliability--was examined. The standard method, the Pearson Product-Moment correlation, overestimated the degree of match between researchers' and teachers' scoring of tests. The simpler percent…

Descriptors: Comparative Analysis, Elementary School Teachers, Evaluation Methods, Evaluators

Previous Page | Next Page »

Pages: 1 | 2 | 3

Lunz, Mary E.	2
Myford, Carol M.	2
O'Neill, Thomas R.	2
Adams, R. J.	1
Alvermann, Donna E.	1
Beasley, T. Mark	1
Bridgeman, Brent	1
Chang, Lei	1
Chavez, Oscar	1
Chen, H. Julie	1
Christine, Charles T.	1
Collins, Angelo	1
Cooper, Peter	1
Crews, William E., Jr.	1
Day, Donald	1
De Ayala, R. J.	1
Dionne, Jean-Paul	1
Dudczak, Craig	1
Ferroli, Lou	1
Garrido, Mariquita	1
Grouws, Douglas A.	1
Hattendorf, Lynn C.	1
Hori, Utako	1
Ito, Tokumi	1
More ▼