NotesFAQContact Us
Collection
Advanced
Search Tips
Showing 46 to 60 of 1,943 results Save | Export
Peer reviewed Peer reviewed
Direct linkDirect link
Fatih Yavuz; Özgür Çelik; Gamze Yavas Çelik – British Journal of Educational Technology, 2025
This study investigates the validity and reliability of generative large language models (LLMs), specifically ChatGPT and Google's Bard, in grading student essays in higher education based on an analytical grading rubric. A total of 15 experienced English as a foreign language (EFL) instructors and two LLMs were asked to evaluate three student…
Descriptors: English (Second Language), Second Language Learning, Second Language Instruction, Computational Linguistics
Peer reviewed Peer reviewed
Direct linkDirect link
Rebecca Sickinger; Tineke Brunfaut; John Pill – Language Testing, 2025
Comparative Judgement (CJ) is an evaluation method, typically conducted online, whereby a rank order is constructed, and scores calculated, from judges' pairwise comparisons of performances. CJ has been researched in various educational contexts, though only rarely in English as a Foreign Language (EFL) writing settings, and is generally agreed to…
Descriptors: Writing Evaluation, English (Second Language), Second Language Learning, Second Language Instruction
Peer reviewed Peer reviewed
Direct linkDirect link
Damian, Elena; Meuleman, Bart; van Oorschot, Wim – Sociological Methods & Research, 2022
In this article, we examine whether cross-national studies disclose enough information for independent researchers to evaluate the validity and reliability of the findings (evaluation transparency) or to perform a direct replication (replicability transparency). The first contribution is theoretical. We develop a heuristic theoretical model…
Descriptors: National Surveys, Cross Cultural Studies, Social Science Research, Periodicals
Peer reviewed Peer reviewed
Direct linkDirect link
Wind, Stefanie A. – Measurement: Interdisciplinary Research and Perspectives, 2022
In many performance assessments, one or two raters from the complete rater pool scores each performance, resulting in a sparse rating design, where there are limited observations of each rater relative to the complete sample of students. Although sparse rating designs can be constructed to facilitate estimation of student achievement, the…
Descriptors: Evaluators, Bias, Identification, Performance Based Assessment
Jayden J. Lee – ProQuest LLC, 2022
The functional neuroanatomy of language localization in dyslexia has primarily been studied in the context of reading. However, dyslexia is sometimes referred to as a "language-based learning disability," yet the functional signature of the core language comprehension network in dyslexia is far less understood. This thesis presents a…
Descriptors: Dyslexia, Brain Hemisphere Functions, Comparative Analysis, Speech Communication
Peer reviewed Peer reviewed
Direct linkDirect link
Alain Bengochea; Sabrina F. Sembiante – Review of Education, 2024
This best-evidence synthesis appraises the design and outcome characteristics of vocabulary intervention studies conducted with preschool through 6th grade emergent bilingual (EB) children and spotlights rigorously designed studies for which effects could be better attributed to instructional features. Twenty-nine selected studies were analysed…
Descriptors: Bilingualism, Vocabulary Development, Intervention, Comparative Analysis
Peer reviewed Peer reviewed
Direct linkDirect link
Hunter, Seth B. – Journal of Education Human Resources, 2023
Teacher performance scores inform education leaders' management of teacher human resources. However, prior research has implied that different interpretations of performance criteria between teachers and their evaluators suppress teacher development. Although research has examined teacher perceptions of performance scores and compared teacher…
Descriptors: Teacher Evaluation, Teacher Effectiveness, Self Evaluation (Individuals), Interrater Reliability
Peer reviewed Peer reviewed
Direct linkDirect link
Farshad Effatpanah; Purya Baghaei; Mona Tabatabaee-Yazdi; Esmat Babaii – Language Testing, 2025
This study aimed to propose a new method for scoring C-Tests as measures of general language proficiency. In this approach, the unit of analysis is sentences rather than gaps or passages. That is, the gaps correctly reformulated in each sentence were aggregated as sentence score, and then each sentence was entered into the analysis as a polytomous…
Descriptors: Item Response Theory, Language Tests, Test Items, Test Construction
Peer reviewed Peer reviewed
Direct linkDirect link
Yun Long; Haifeng Luo; Yu Zhang – npj Science of Learning, 2024
This study explores the use of Large Language Models (LLMs), specifically GPT-4, in analysing classroom dialogue--a key task for teaching diagnosis and quality improvement. Traditional qualitative methods are both knowledge- and labour-intensive. This research investigates the potential of LLMs to streamline and enhance this process. Using…
Descriptors: Classroom Communication, Computational Linguistics, Chinese, Mathematics Instruction
Peer reviewed Peer reviewed
PDF on ERIC Download full text
Tim Stoeckel; Liang Ye Tan; Hung Tan Ha; Nam Thi Phuong Ho; Tomoko Ishii; Young Ae Kim; Chunmei Huang; Stuart McLean – Vocabulary Learning and Instruction, 2024
Local item dependency (LID) occurs when test-takers' responses to one test item are affected by their responses to another. It can be problematic if it causes inflated reliability estimates or distorted person and item measures. The cued-recall reading comprehension test in Hu and Nation's (2000) well-known and influential coverage--comprehension…
Descriptors: Reading Comprehension, English (Second Language), Second Language Instruction, Second Language Learning
Gill, Tim – Research Matters, 2022
In Comparative Judgement (CJ) exercises, examiners are asked to look at a selection of candidate scripts (with marks removed) and order them in terms of which they believe display the best quality. By including scripts from different examination sessions, the results of these exercises can be used to help with maintaining standards. Results from…
Descriptors: Comparative Analysis, Decision Making, Scripts, Standards
Heather Raithel – ProQuest LLC, 2023
A mixed methods action research study was designed to answer three research questions based on inter-rater reliability (IRR) in compliance calls for transition at a state education agency, perceived confidence levels in making and discussing compliance calls, and perceived confidence in sharing transition resources. An innovation based on…
Descriptors: Public Agencies, Interrater Reliability, Compliance (Legal), Comparative Analysis
Peer reviewed Peer reviewed
Direct linkDirect link
Saluja, Ronak; Cheng, Sierra; delos Santos, Keemo Althea; Chan, Kelvin K. W. – Research Synthesis Methods, 2019
Objective: Various statistical methods have been developed to estimate hazard ratios (HRs) from published Kaplan-Meier (KM) curves for the purpose of performing meta-analyses. The objective of this study was to determine the reliability, accuracy, and precision of four commonly used methods by Guyot, Williamson, Parmar, and Hoyle and Henley.…
Descriptors: Meta Analysis, Reliability, Accuracy, Randomized Controlled Trials
Peer reviewed Peer reviewed
Direct linkDirect link
Meyerhoff, Hauke S.; Grinschgl, Sandra; Papenmeier, Frank; Gilbert, Sam J. – Cognitive Research: Principles and Implications, 2021
The cognitive load of many everyday life tasks exceeds known limitations of short-term memory. One strategy to compensate for information overload is cognitive offloading which refers to the externalization of cognitive processes such as reminder setting instead of memorizing. There appears to be remarkable variance in offloading behavior between…
Descriptors: Individual Differences, Task Analysis, Reliability, Short Term Memory
Peer reviewed Peer reviewed
Direct linkDirect link
Whalen, Kate; Paez, Antonio – Journal of Geography, 2022
Experiential education partnered with guided reflection is thought to support students with higher-order thinking skills. In this study, 44 reflections from two university-level sustainability courses were compared. In both courses students were asked to write a reflection, but only one course used the Reflective Learning Framework (RLF). Tests of…
Descriptors: Geography Instruction, Thinking Skills, Experiential Learning, Sustainability
Pages: 1  |  2  |  3  |  4  |  5  |  6  |  7  |  8  |  9  |  10  |  11  |  ...  |  130