ERIC - Search Results

Publication Date

In 2026	0
Since 2025	0
Since 2022 (last 5 years)	14
Since 2017 (last 10 years)	34
Since 2007 (last 20 years)	88

Descriptor

Correlation	123
Scoring	123
Test Reliability	54
Interrater Reliability	45
Reliability	36
Test Validity	36
Comparative Analysis	32
Foreign Countries	31
Scores	31
Computer Assisted Testing	27
Evaluators	22
Statistical Analysis	22
English (Second Language)	21
Language Tests	21
Second Language Learning	20
Validity	19
Essays	18
Factor Analysis	17
Test Construction	17
Test Items	17
Writing Evaluation	16
Multiple Choice Tests	14
College Students	13
Essay Tests	13
Evaluation Methods	13
More ▼

Publication Type

Reports - Research	90
Journal Articles	86
Reports - Evaluative	13
Tests/Questionnaires	12
Speeches/Meeting Papers	8
Dissertations/Theses -…	3
Reports - Descriptive	3
Information Analyses	2
Numerical/Quantitative Data	2
Guides - Classroom - Teacher	1
Guides - Non-Classroom	1
More ▼

Education Level

Higher Education	27
Postsecondary Education	23
Secondary Education	10
Elementary Secondary Education	6
Early Childhood Education	4
Elementary Education	4
Grade 8	4
High Schools	4
Primary Education	3
Grade 3	2
Kindergarten	2
Grade 10	1
Grade 11	1
Grade 5	1
Grade 7	1
Grade 9	1
Junior High Schools	1
Middle Schools	1
More ▼

Audience

Researchers	2
Practitioners	1
Teachers	1

Location

China	6
Turkey	4
California	3
Netherlands	3
Hong Kong	2
India	2
Japan	2
United Kingdom	2
United Kingdom (England)	2
Canada	1
Chile	1
Colombia	1
Estonia	1
Georgia	1
Germany	1
Israel	1
Jordan	1
Mexico	1
Nebraska (Lincoln)	1
New Jersey	1
Nigeria	1
North Carolina (Greensboro)	1
Panama	1
Russia	1
Singapore	1
More ▼

Laws, Policies, & Programs

What Works Clearinghouse Rating

Meets WWC Standards without Reservations	1
Meets WWC Standards with or without Reservations	1

Showing 1 to 15 of 123 results Save | Export

Accuracy and Reliability of Large Language Models in Assessing Learning Outcomes Achievement across Cognitive Domains

Peer reviewed

Direct link

Swapna Haresh Teckwani; Amanda Huee-Ping Wong; Nathasha Vihangi Luke; Ivan Cherh Chiet Low – Advances in Physiology Education, 2024

The advent of artificial intelligence (AI), particularly large language models (LLMs) like ChatGPT and Gemini, has significantly impacted the educational landscape, offering unique opportunities for learning and assessment. In the realm of written assessment grading, traditionally viewed as a laborious and subjective process, this study sought to…

Descriptors: Accuracy, Reliability, Computational Linguistics, Standards

Online Administration of the Test of Narrative Language--Second Edition: Psychometrics and Considerations for Remote Assessment

Peer reviewed
PDF on ERIC

Download full text

Direct link

Beula M. Magimairaj; Philip Capin; Sandra L. Gillam; Sharon Vaughn; Greg Roberts; Anna-Maria Fall; Ronald B. Gillam – Grantee Submission, 2022

Purpose: Our aim was to evaluate the psychometric properties of the online administered format of the Test of Narrative Language--Second Edition (TNL-2; Gillam & Pearson, 2017), given the importance of assessing children's narrative ability and considerable absence of psychometric studies of spoken language assessments administered online.…

Descriptors: Computer Assisted Testing, Language Tests, Story Telling, Language Impairments

Development and Validation of a Short-Form Inventory to Identify Personality Types: The Personality Identity Estimator (PIE)

Peer reviewed
PDF on ERIC

Download full text

Conti, Gary J. – Journal of Education and Learning, 2023

The use of personality inventories has been limited because of their cost and the length. To overcome these limitations, this study created the Personality Identity Estimator (PIE), an easy-to-use inventory to estimate personality types that can be used at no cost. PIE is a categorical inventory containing 12 items with 3 items for each of the 4…

Descriptors: Personality Measures, Personality Traits, Validity, Reliability

Resolving and Re-Scoring Constructed Response Items in Mixed-Format Assessments: An Exploration of Three Approaches

Peer reviewed

Direct link

Stefanie A. Wind; Yangmeng Xu – Educational Assessment, 2024

We explored three approaches to resolving or re-scoring constructed-response items in mixed-format assessments: rater agreement, person fit, and targeted double scoring (TDS). We used a simulation study to consider how the three approaches impact the psychometric properties of student achievement estimates, with an emphasis on person fit. We found…

Descriptors: Interrater Reliability, Error of Measurement, Evaluation Methods, Examiners

Online Administration of the Test of Narrative Language--Second Edition: Psychometrics and Considerations for Remote Assessment

Peer reviewed
PDF on ERIC

Download full text

Direct link

Beula M. Magimairaj; Philip Capin; Sandra L. Gillam; Sharon Vaughn; Greg Roberts; Anna-Maria Fall; Ronald B. Gillam – Language, Speech, and Hearing Services in Schools, 2022

Descriptors: Computer Assisted Testing, Language Tests, Story Telling, Language Impairments

A Comparison of Latent Semantic Analysis and Latent Dirichlet Allocation in Educational Measurement

Peer reviewed

Direct link

Jordan M. Wheeler; Allan S. Cohen; Shiyu Wang – Journal of Educational and Behavioral Statistics, 2024

Topic models are mathematical and statistical models used to analyze textual data. The objective of topic models is to gain information about the latent semantic space of a set of related textual data. The semantic space of a set of textual data contains the relationship between documents and words and how they are used. Topic models are becoming…

Descriptors: Semantics, Educational Assessment, Evaluators, Reliability

Rater Connections and the Detection of Bias in Performance Assessment

Peer reviewed

Direct link

Wind, Stefanie A. – Measurement: Interdisciplinary Research and Perspectives, 2022

In many performance assessments, one or two raters from the complete rater pool scores each performance, resulting in a sparse rating design, where there are limited observations of each rater relative to the complete sample of students. Although sparse rating designs can be constructed to facilitate estimation of student achievement, the…

Descriptors: Evaluators, Bias, Identification, Performance Based Assessment

The Influence of Rater Effects in Training Sets on the Psychometric Quality of Automated Scoring for Writing Assessments

Peer reviewed

Direct link

Wind, Stefanie A.; Wolfe, Edward W.; Engelhard, George, Jr.; Foltz, Peter; Rosenstein, Mark – International Journal of Testing, 2018

Automated essay scoring engines (AESEs) are becoming increasingly popular as an efficient method for performance assessments in writing, including many language assessments that are used worldwide. Before they can be used operationally, AESEs must be "trained" using machine-learning techniques that incorporate human ratings. However, the…

Descriptors: Computer Assisted Testing, Essay Tests, Writing Evaluation, Scoring

Adaptation and Validation of a Test of Ethical Sensitivity in Teaching

Peer reviewed

Direct link

Maxwell, Bruce; Boon, Helen; Tanchuk, Nicolas; Rauwerda, Bryan – Journal of Moral Education, 2021

This article documents the adaptation, piloting and validation of a measure of teachers' ethical sensitivity. To create the test, we modified a measure from dentistry drawing on literature in teacher professional ethics and drew on the expertise of professional ethics scholars and practitioners. Based on the results of Rasch analysis combined with…

Descriptors: Ethics, Moral Values, Scores, Teacher Education Programs

Meta-Analysis of Inter-Rater Agreement and Discrepancy Between Human and Automated English Essay Scoring

Peer reviewed
PDF on ERIC

Download full text

Direct link

Jiyeo Yun – English Teaching, 2023

Studies on automatic scoring systems in writing assessments have also evaluated the relationship between human and machine scores for the reliability of automated essay scoring systems. This study investigated the magnitudes of indices for inter-rater agreement and discrepancy, especially regarding human and machine scoring, in writing assessment.…

Descriptors: Meta Analysis, Interrater Reliability, Essays, Scoring

Semantic Distance and the Alternate Uses Task: Recommendations for Reliable Automated Assessment of Originality

Peer reviewed

Direct link

Beaty, Roger E.; Johnson, Dan R.; Zeitlen, Daniel C.; Forthmann, Boris – Creativity Research Journal, 2022

Semantic distance is increasingly used for automated scoring of originality on divergent thinking tasks, such as the Alternate Uses Task (AUT). Despite some psychometric support for semantic distance -- including positive correlations with human creativity ratings -- additional work is needed to optimize its reliability and validity, including…

Descriptors: Semantics, Scoring, Creative Thinking, Creativity

Reliability and Stability of the Metrical Stress Effect on Segmental Production Accuracy in Persons with Apraxia of Speech

Peer reviewed

Direct link

Bailey, Dallin J.; Bunker, Lisa; Mauszycki, Shannon; Wambaugh, Julie L. – International Journal of Language & Communication Disorders, 2019

Background: Acquired apraxia of speech (AOS) involves speech-production deficits on both the segmental and suprasegmental levels. Recent research has identified a non-linear interaction between the metrical structure of bisyllabic words and word-production accuracy in German speakers with AOS, with trochaic words (strong-weak stress) being…

Descriptors: Accuracy, Suprasegmentals, Phonology, German

Students' Use of Formalisations for Improved Logical Reasoning

Peer reviewed

Direct link

Bronkhorst, Hugo; Roorda, Gerrit; Suhre, Cor; Goedhart, Martin – Research in Mathematics Education, 2022

Logical reasoning as part of critical thinking is becoming more and more important to prepare students for their future life in society, work, and study. This article presents the results of a quasi-experimental study with a pre-test-post-test control group design focusing on the effective use of formalisations to support logical reasoning. The…

Descriptors: Mathematics Instruction, Teaching Methods, Logical Thinking, Critical Thinking

Validation of an Automated Procedure for Calculating Core Lexicon from Transcripts

Peer reviewed

Direct link

Dalton, Sarah Grace; Stark, Brielle C.; Fromm, Davida; Apple, Kristen; MacWhinney, Brian; Rensch, Amanda; Rowedder, Madyson – Journal of Speech, Language, and Hearing Research, 2022

Purpose: The aim of this study was to advance the use of structured, monologic discourse analysis by validating an automated scoring procedure for core lexicon (CoreLex) using transcripts. Method: Forty-nine transcripts from persons with aphasia and 48 transcripts from persons with no brain injury were retrieved from the AphasiaBank database. Five…

Descriptors: Validity, Discourse Analysis, Databases, Scoring

Development and Validation of the Written Communication Assessment of the "HEIghten"® Outcomes Assessment Suite. Research Report. ETS RR-17-53

Peer reviewed
PDF on ERIC

Download full text

Rios, Joseph A.; Sparks, Jesse R.; Zhang, Mo; Liu, Ou Lydia – ETS Research Report Series, 2017

Proficiency with written communication (WC) is critical for success in college and careers. As a result, institutions face a growing challenge to accurately evaluate their students' writing skills to obtain data that can support demands of accreditation, accountability, or curricular improvement. Many current standardized measures, however, lack…

Descriptors: Test Construction, Test Validity, Writing Tests, College Outcomes Assessment

Previous Page | Next Page »

Pages: 1 | 2 | 3 | 4 | 5 | 6 | 7 | 8 | 9

ETS Research Report Series	12
Applied Measurement in…	5
Educational and Psychological…	5
Language Testing	4
Advances in Health Sciences…	3
ProQuest LLC	3
Applied Linguistics	2
Educational Testing Service	2
International Journal of…	2
International Journal of…	2
Journal of Educational…	2
Journal of Speech, Language,…	2
Perceptual and Motor Skills	2
ACT, Inc.	1
Advances in Physiology…	1
American Journal on Mental…	1
Anatomical Sciences Education	1
Applied Psychological…	1
Asia-Pacific Forum on Science…	1
Assessing Writing	1
Assessment for Effective…	1
Autism: The International…	1
CALICO Journal	1
CBE - Life Sciences Education	1
Canadian Journal of School…	1
More ▼

Attali, Yigal	4
Zhang, Mo	3
Anna-Maria Fall	2
Beula M. Magimairaj	2
Gentile, Claudia	2
Greene, John F.	2
Greg Roberts	2
Kantor, Robert	2
Lee, Yong-Won	2
Philip Capin	2
Ramineni, Chaitanya	2
Ronald B. Gillam	2
Sandra L. Gillam	2
Schwanenflugel, Paula J.	2
Sharon Vaughn	2
Steedle, Jeffrey T.	2
Trapani, Catherine S.	2
Williamson, David M.	2
Wind, Stefanie A.	2
Abdul Gafoor, K.	1
Allan S. Cohen	1
Allison, Carrie	1
Amanda Huee-Ping Wong	1
Anderson, Paul S.	1
More ▼

Test of English as a Foreign…	8
Graduate Record Examinations	6
SAT (College Admission Test)	4
ACT Assessment	3
Torrance Tests of Creative…	3
Myers Briggs Type Indicator	2
Peabody Picture Vocabulary…	2
ACT Interest Inventory	1
Clinical Evaluation of…	1
Goodenough Harris Drawing Test	1
Marlowe Crowne Social…	1
McCarthy Scales of Childrens…	1
NEO Personality Inventory	1
National Assessment of…	1
Praxis Series	1
Raven Progressive Matrices	1
Strengths and Difficulties…	1
Teaching and Learning…	1
Test of Language Development	1
Test of Standard Written…	1
United States Medical…	1
Wechsler Intelligence Scale…	1
Woodcock Johnson Tests of…	1
Woodcock Reading Mastery Test	1
More ▼