ERIC - Search Results

Publication Date

In 2025	415
Since 2024	415

Descriptor

Test Reliability	289
Test Validity	235
Foreign Countries	224
Test Construction	132
Reliability	107
Psychometrics	81
Measures (Individuals)	80
Factor Analysis	71
Artificial Intelligence	49
Teacher Attitudes	48
College Students	47
Evaluation Methods	45
Student Attitudes	45
Factor Structure	42
Undergraduate Students	42
Test Items	39
Scores	37
Validity	34
Questionnaires	33
Gender Differences	32
Technology Uses in Education	32
Interrater Reliability	31
Elementary School Students	30
Rating Scales	28
Second Language Learning	28
More ▼

Publication Type

Journal Articles	407
Reports - Research	386
Tests/Questionnaires	49
Information Analyses	16
Reports - Descriptive	10
Reports - Evaluative	8
Numerical/Quantitative Data	1

Education Level

Higher Education	146
Postsecondary Education	146
Secondary Education	75
Elementary Education	61
High Schools	34
Middle Schools	26
Early Childhood Education	21
Junior High Schools	19
Elementary Secondary Education	14
Intermediate Grades	12
Primary Education	9
Grade 4	6
Grade 5	6
Preschool Education	6
Grade 11	4
Grade 2	4
Grade 6	4
Adult Education	3
Grade 12	3
Grade 3	3
Grade 7	3
Kindergarten	3
Grade 1	2
Grade 10	2
Grade 8	2
More ▼

Audience

Researchers	5
Policymakers	4
Practitioners	3
Teachers	3
Administrators	1
Counselors	1

Location

Turkey	52
China	24
Indonesia	16
Spain	11
Taiwan	10
Canada	8
Iran	8
Thailand	8
United Kingdom	8
United States	7
Australia	6
Saudi Arabia	6
South Korea	5
India	4
California	3
Finland	3
Germany	3
Japan	3
Jordan	3
Malaysia	3
Philippines	3
Switzerland	3
Austria	2
Belgium	2
Brazil	2
More ▼

Laws, Policies, & Programs

What Works Clearinghouse Rating

Showing 1 to 15 of 415 results Save | Export

Technical Adequacy-Reliability

Peer reviewed

Direct link

Susan K. Johnsen – Gifted Child Today, 2025

The author provides information about reliability and areas that educators should examine in determining if an assessment is consistent and trustworthy for use, and how it should be interpreted in making decisions about students. Reliability areas that are discussed in the column include internal consistency, test-retest or stability, inter-scorer…

Descriptors: Test Reliability, Academically Gifted, Student Evaluation, Error of Measurement

Test-Retest and Inter-Rater Reliability for Selected Outcomes from a Wearable 3D Inertial Sensor over Different Stable and Unstable Postural Conditions: A Validation Study

Peer reviewed

Direct link

Samuel D'Emanuele; Francesca Nardello; Fabrizio Garau; Diego Campaci; Federico Schena; Cantor Tarperi – Measurement in Physical Education and Exercise Science, 2025

The agreement between a wearable inertial sensor (GYKO, G) and the force platform (P) was assessed by evaluating "test-retest" and "inter-rater reliability." Thirty-eight subjects were enrolled; the selected indices of balance were investigated over foot positions and (un)stable conditions. Intraclass correlation coefficient…

Descriptors: Human Posture, Measurement Equipment, Interrater Reliability, Measurement Techniques

Grading Exams Using Large Language Models: A Comparison between Human and AI Grading of Exams in Higher Education Using ChatGPT

Peer reviewed

Direct link

Jonas Flodén – British Educational Research Journal, 2025

This study compares how the generative AI (GenAI) large language model (LLM) ChatGPT performs in grading university exams compared to human teachers. Aspects investigated include consistency, large discrepancies and length of answer. Implications for higher education, including the role of teachers and ethics, are also discussed. Three…

Descriptors: College Faculty, Artificial Intelligence, Comparative Testing, Scoring

Using Automated Procedures to Score Educational Essays Written in Three Languages

Peer reviewed

Direct link

Tahereh Firoozi; Hamid Mohammadi; Mark J. Gierl – Journal of Educational Measurement, 2025

The purpose of this study is to describe and evaluate a multilingual automated essay scoring (AES) system for grading essays in three languages. Two different sentence embedding models were evaluated within the AES system, multilingual BERT (mBERT) and language-agnostic BERT sentence embedding (LaBSE). German, Italian, and Czech essays were…

Descriptors: College Students, Slavic Languages, German, Italian

Engaging Classroom Observation: A Brief Measure of Active Learning in the College Classroom

Peer reviewed

Direct link

Chase Young; Benjamin Mitchell-Yellin; George Kevin Randall – Active Learning in Higher Education, 2025

The purpose of this study was to develop a valid, reliable, and brief measure of active learning in college classrooms that is cheap and easy to complete and yields results that faculty can easily use to inform their development as instructors. Initial construct and face validity was achieved by modifying existing instruments and creating a draft…

Descriptors: College Faculty, College Students, Active Learning, Classroom Observation Techniques

Validity and Intrarater Reliability of the Fysiometer--Measuring Eccentric Knee Flexor Force during the Nordic Hamstring Exercise

Peer reviewed

Direct link

Morten Pallisgaard Støve; Mathias Kringelholt Kristensen; Jonas Nielsen; Lea Dyhrberg Madsen – Measurement in Physical Education and Exercise Science, 2025

Between limb strength, asymmetry is a leading risk factor for hamstring strain re-injury. However, few accurate testing methodologies are available in clinical settings. This study examined the validity and reliability of eccentric knee flexor torque measured with a novel Nordic Hamstring Device. Twenty-seven healthy participants were assessed in…

Descriptors: Validity, Reliability, Human Body, Foreign Countries

The Vague Language Use Scale: Clinical Utility and Psychometrics from Adults with Traumatic Brain Injury

Peer reviewed

Direct link

Kathryn J. Greenslade; Julia K. Bushell; Emily F. Dillon; Amy E. Ramage – International Journal of Language & Communication Disorders, 2025

Background: Pragmatic communication difficulties encompass many distinct behaviours, including the use of vague and/or insufficient language, a common characteristic following traumatic brain injury (TBI) that negatively impacts psychosocial outcomes. Existing assessments evaluate pragmatic communication broadly, often with only one or two items…

Descriptors: Neurological Impairments, Head Injuries, Language Impairments, Language Tests

Treatment Fidelity in a Feasibility Trial of the Aphasia Intervention, Virtual Elaborated Semantic Feature Analysis

Peer reviewed

Direct link

Niamh Devane; Sofia Mazzoleni; Nicholas Behn; Jane Marshall; Stephanie Wilson; Katerina Hilari – International Journal of Language & Communication Disorders, 2025

Background and Aims: The reliability and validity of an intervention can be improved by checking treatment fidelity (TF). TF methods identify core components of an intervention, check their presence (or absence) and identify threats to fidelity. The Virtual Elaborated Semantic Feature Analysis (VESFA) intervention comprised individual sessions of…

Descriptors: Aphasia, Intervention, Fidelity, Feasibility Studies

Evidence-Based Evaluation of Student and Marker Performances in Assessment and Examination

Peer reviewed

Direct link

Ole J. Kemi – Advances in Physiology Education, 2025

Students are assessed by coursework and/or exams, all of which are marked by assessors (markers). Student and marker performances are then subject to end-of-session board of examiner handling and analysis. This occurs annually and is the basis for evaluating students but also the wider learning and teaching efficiency of an academic institution.…

Descriptors: Undergraduate Students, Evaluation Methods, Evaluation Criteria, Academic Standards

The Scale of Sincerity Based on Kyai Haji Ahmad Dahlan's Version for Islamic Students: The Rasch Analysis

Peer reviewed
PDF on ERIC

Download full text

Wahyu Nanda Eka Saputra; Trikinasih Handayani; Prima Suci Rohmadheny; Rohmatus Naini; Dody Hartanto; Hardi Santosa; Dewi Afra Khairunnisa; Risma Risansyah; Hanan Riati; Faturrahman – Journal of Education and Learning (EduLearn), 2025

The students are urged to do something without expecting anything in return and only in the name of God. Every islamic student becomes something ideal if they can internalize and implement sincerity. Many people are willing to do something because of an ulterior motive. The importance of sincerity in humans is the background for developing a…

Descriptors: Islam, Interrater Reliability, Prosocial Behavior, Muslims

Another Look at Yen's Q3: Is 0.2 an Appropriate Cut-Off?

Peer reviewed

Direct link

Kelsey Nason; Christine DeMars – Journal of Educational Measurement, 2025

This study examined the widely used threshold of 0.2 for Yen's Q3, an index for violations of local independence. Specifically, a simulation was conducted to investigate whether Q3 values were related to the magnitude of bias in estimates of reliability, item parameters, and examinee ability. Results showed that Q3 values below the typical cut-off…

Descriptors: Item Response Theory, Statistical Bias, Test Reliability, Test Items

Measuring Intentional Communication in Infants at Elevated Likelihood of Autism: Validity, Reliability, and Responsiveness of a Novel Coding Scale

Peer reviewed

Direct link

Elizabeth Choi-Tucci; John Sideris; Cristin Holland; Grace T. Baranek; Linda R. Watson – Journal of Speech, Language, and Hearing Research, 2025

Purpose: Intentional communication acts, or purposefully directed vocalizations and gestures, are particularly difficult for infants at elevated likelihood for eventual diagnosis of autism. The ability to measure and track intentional communication in infancy thus has the potential to aid early identification and intervention efforts. This study…

Descriptors: Infants, Autism Spectrum Disorders, Caregiver Child Relationship, Nonverbal Communication

A Review of Automatic Item Generation Techniques Leveraging Large Language Models

Peer reviewed
PDF on ERIC

Download full text

Bin Tan; Nour Armoush; Elisabetta Mazzullo; Okan Bulut; Mark J. Gierl – International Journal of Assessment Tools in Education, 2025

This study reviews existing research on the use of large language models (LLMs) for automatic item generation (AIG). We performed a comprehensive literature search across seven research databases, selected studies based on predefined criteria, and summarized 60 relevant studies that employed LLMs in the AIG process. We identified the most commonly…

Descriptors: Artificial Intelligence, Test Items, Automation, Test Format

Mental Toughness of Physical Education Teachers: Validation of a New Questionnaire

Peer reviewed

Direct link

Sima Zach; Noa Fishler-Barum; Itamar Shidlov – Physical Educator, 2025

The purpose of the study was to develop the Teachers' Mental Toughness Questionnaire (TMTQ). The questionnaire was developed in six stages: item generation, content validity, exploratory factor analysis, reliability tests, convergent validity tests, and discriminant validity. The factor analysis indicates that it measures six factors: team,…

Descriptors: Test Construction, Test Validity, Test Reliability, Psychometrics

Interdisciplinary Thinking among Seventh-Grade Students in Lower-Secondary Science Education

Peer reviewed
PDF on ERIC

Download full text

Shasha Chen; Shaohui Chi; Zuhao Wang – Journal of Baltic Science Education, 2025

Interdisciplinary thinking is critical for equipping students to apply scientific knowledge and tackle societal challenges across various disciplines, which has been recognized as a key objective of twenty-first century science education. However, research on effective interdisciplinary assessment in secondary school science education is still…

Descriptors: Thinking Skills, Interdisciplinary Approach, Science Instruction, Grade 7

Previous Page | Next Page »

Pages: 1 | 2 | 3 | 4 | 5 | 6 | 7 | 8 | 9 | 10 | 11 | ... | 28

Education and Information…	21
Journal of Psychoeducational…	13
Measurement in Physical…	13
International Journal of…	11
European Journal of Education	10
International Journal of…	10
Psychology in the Schools	10
Journal of Autism and…	9
Educational Process:…	8
Journal of Education and…	7
Journal of Educational…	7
SAGE Open	7
Discover Education	6
Journal of Baltic Science…	6
Journal of Applied Research…	5
Journal of Computer Assisted…	5
Annenberg Institute for…	4
Autism: The International…	4
Educational and Psychological…	4
International Journal of…	4
International Journal of…	4
International Journal of…	4
Journal of Attention Disorders	4
Language Testing	4
Measurement and Evaluation in…	4
More ▼

Benjamin W. Domingue	2
Hamdollah Ravand	2
Hongwei Yang	2
Joshua B. Gilbert	2
Juan Cruz	2
Li Wang	2
Mark J. Gierl	2
Mei-ki Chan	2
Mustafa Taktak	2
Olena Bolgova	2
Patsawut Sukserm	2
Sachin Nedungadi	2
Songül Karabatak	2
Volodymyr Mavrych	2
Zubair Ahmad	2
Özgen Korkmaz	2
A. Corinne Huggins-Manley	1
Aaron Montoya	1
Aaron P. Wood	1
Abdul Muktadir	1
Abdulkarim Alhossein	1
Abdullah Alamer	1
Abdullah Alshakhi	1
Abdullah D. Alenezi	1
Abdullah Faruk Kiliç	1
More ▼

Strengths and Difficulties…	4
Vineland Adaptive Behavior…	3
Autism Diagnostic Observation…	2
Classroom Assessment Scoring…	2
Depression Anxiety and Stress…	2
Mullen Scales of Early…	2
Social Skills Improvement…	2
Teachers Sense of Efficacy…	2
ACT Assessment	1
ACTFL Oral Proficiency…	1
Aberrant Behavior Checklist	1
Ages and Stages Questionnaires	1
Bayley Mental Development…	1
Beck Depression Inventory	1
Center for Epidemiologic…	1
Child Behavior Checklist	1
Dynamic Indicators of Basic…	1
Early Childhood Environment…	1
Eyberg Child Behavior…	1
Foreign Language Classroom…	1
Home Observation for…	1
Maslach Burnout Inventory	1
Multidimensional…	1
Peabody Picture Vocabulary…	1
Pediatric Evaluation of…	1
More ▼