Publication Date
| In 2026 | 0 |
| Since 2025 | 12 |
| Since 2022 (last 5 years) | 114 |
| Since 2017 (last 10 years) | 375 |
| Since 2007 (last 20 years) | 1130 |
Descriptor
| Comparative Analysis | 1943 |
| Reliability | 880 |
| Test Reliability | 792 |
| Foreign Countries | 554 |
| Test Validity | 443 |
| Correlation | 350 |
| Validity | 332 |
| Interrater Reliability | 327 |
| Statistical Analysis | 321 |
| Scores | 280 |
| Measures (Individuals) | 236 |
| More ▼ | |
Source
Author
| Reckase, Mark D. | 6 |
| Attali, Yigal | 5 |
| Coniam, David | 5 |
| Brennan, Robert L. | 4 |
| Crehan, Kevin D. | 4 |
| Feldt, Leonard S. | 4 |
| Hakstian, A. Ralph | 4 |
| Jones, Ian | 4 |
| Kolen, Michael J. | 4 |
| Lunz, Mary E. | 4 |
| August, Diane | 3 |
| More ▼ | |
Publication Type
Education Level
Audience
| Researchers | 35 |
| Practitioners | 29 |
| Teachers | 15 |
| Administrators | 9 |
| Policymakers | 6 |
| Counselors | 2 |
| Media Staff | 2 |
| Parents | 1 |
| Support Staff | 1 |
Location
| Turkey | 59 |
| United States | 47 |
| Australia | 36 |
| China | 33 |
| Canada | 32 |
| United Kingdom (England) | 32 |
| United Kingdom | 28 |
| Germany | 25 |
| Netherlands | 24 |
| Taiwan | 22 |
| Hong Kong | 20 |
| More ▼ | |
Laws, Policies, & Programs
Assessments and Surveys
What Works Clearinghouse Rating
| Meets WWC Standards with or without Reservations | 1 |
| Does not meet standards | 1 |
Olena Bolgova; Paul Ganguly; Volodymyr Mavrych – Anatomical Sciences Education, 2025
Integrating artificial intelligence, particularly large language models (LLMs), into medical education represents a significant new step in how medical knowledge is accessed, processed, and evaluated. The objective of this study was to conduct a comprehensive analysis comparing the performance of advanced LLM chatbots in different topics of…
Descriptors: Comparative Analysis, Artificial Intelligence, Technology Uses in Education, Natural Language Processing
DeLuca, Stefanie – Sociological Methods & Research, 2023
Increasingly, the broader public, media and policymakers are looking to qualitative research to provide answers to our most pressing social questions. While an exciting and perhaps overdue moment for qualitative researchers, it is also a time when the method is coming under increasing scrutiny for a lack of reliability and transparency. The…
Descriptors: Qualitative Research, Reliability, Standards, Participant Observation
Kinnear, George; Bennett, Max; Binnie, Rachel; Bolt, Róisín; Zheng, Yinglan – Teaching Mathematics and Its Applications, 2020
The MATH taxonomy classifies questions according to the mathematical skills required to answer them. It was created to aid the development of more balanced assessments in undergraduate mathematics and has since been used to compare different assessment regimes across school and university. To date, there has been no systematic investigation of the…
Descriptors: Taxonomy, Mathematics Instruction, Teaching Methods, Reliability
The AI Teacher Test: Measuring the Pedagogical Ability of Blender and GPT-3 in Educational Dialogues
Tack, Anaïs; Piech, Chris – International Educational Data Mining Society, 2022
How can we test whether state-of-the-art generative models, such as Blender and GPT-3, are good AI teachers, capable of replying to a student in an educational dialogue? Designing an AI teacher test is challenging: although evaluation methods are much-needed, there is no off-the-shelf solution to measuring pedagogical ability. This paper reports…
Descriptors: Artificial Intelligence, Dialogs (Language), Bayesian Statistics, Decision Making
Ali Al-Barakat; Rommel AlAli; Omayya Al-Hassan; Khaled Al-Saud – Educational Process: International Journal, 2025
Background/purpose: The study tries to discover how predictive thinking can be incorporated into writing activities to assist students in developing their creative skills in writing learning environments. Through this study, teachers will be able to adopt a new teaching method that helps transform the way creative writing is taught in language…
Descriptors: Thinking Skills, Creative Writing, Writing Instruction, Validity
Rebernik, Teja; Jacobi, Jidde; Tiede, Mark; Wieling, Martijn – Journal of Speech, Language, and Hearing Research, 2021
Purpose: This study compares two electromagnetic articulographs manufactured by Northern Digital, Inc.: the NDI Wave System (from 2008) and the NDI Vox-EMA System (from 2020). Method: Four experiments were completed: (1) comparison of statically positioned sensors; (2) tracking dynamic movements of sensors manipulated using a motor-driven LEGO…
Descriptors: Measurement Equipment, Articulation (Speech), Accuracy, Reliability
Dae Woong Ham; Luke Miratrix – Grantee Submission, 2024
The consequence of a change in school leadership (e.g., principal turnover) on student achievement has important implications for education policy. The impact of such an event can be estimated via the popular Difference in Difference (DiD) estimator, where those schools with a turnover event are compared to a selected set of schools that did not…
Descriptors: Trend Analysis, Faculty Mobility, Academic Achievement, Principals
Lanah Stafford; Erin Cousins; Linda Bol; Megan Mize – Research & Practice in Assessment, 2023
Integrative learning is an important outcome for graduates of higher education. Therefore, it should be well-defined and assessed reliably. The American Association of Colleges & Universities has developed a rubric to define and assess integrative learning, but it has low reliability. This pilot study examines whether this rubric's reliability…
Descriptors: Scoring Rubrics, Reliability, Evaluation Methods, Faculty Development
Caroline F. Rowland; Amy Bidgood; Gary Jones; Andrew Jessop; Paula Stinson; Julian M. Pine; Samantha Durrant; Michelle S. Peter – Language Learning, 2025
A strong predictor of children's language is performance on non-word repetition (NWR) tasks. However, the basis of this relationship remains unknown. Some suggest that NWR tasks measure phonological working memory, which then affects language growth. Others argue that children's knowledge of language/language experience affects NWR performance. A…
Descriptors: Vocabulary Development, Comparative Analysis, Computational Linguistics, Language Skills
Kapsner-Smith, Mara R.; Opuszynski, Amanda; Stepp, Cara E.; Eadie, Tanya L. – Journal of Speech, Language, and Hearing Research, 2021
Purpose: The reliability of auditory-perceptual judgments between listeners is a long-standing problem in the assessment of voice disorders. The purpose of this study was to determine whether a relatively novel experimental scaling method, called visual sort and rate (VSR), yielded stronger reliability than the more frequently used method of…
Descriptors: Voice Disorders, Interrater Reliability, Rating Scales, Severity (of Disability)
Marine Simon; Alexandra Budke – Journal of Geography in Higher Education, 2024
Comparison is an important geographic method and a common task in geography education. Mastering comparison is a complex competency and written comparisons are challenging tasks both for students and assessors. As yet, however, there is no set test for evaluating comparison competency nor tool for enhancing it. Moreover, little is known about…
Descriptors: Geography Instruction, Student Evaluation, Comparative Analysis, Reliability
Seyda Aydin-Karaca; Mustafa Serdar Köksal; Bilkay Bi – Journal of Psychoeducational Assessment, 2024
This study aimed to develop a parent rating scale (PRSG) for screening children for further identification process in terms of giftedness. The participants of the study were 255 parents of gifted and non-gifted students. The PRSG, consisting of 30 items, was created by consulting parents and reviewing instruments existent in the literature. As…
Descriptors: Rating Scales, Parent Attitudes, Scores, Comparative Analysis
Lisa Frances; Frances Quinn; Sue Elliott; Jo Bird – Australian Educational Researcher, 2024
In this article, we explore inconsistencies in the implementation of outdoor learning across Australian early years' education. The benefits of outdoor learning justify regular employment of this pedagogical approach in both early childhood education and primary school settings. Early childhood education services provide daily outdoor learning…
Descriptors: Foreign Countries, Outdoor Education, Program Implementation, Elementary Education
Taylor, Tessa; Lanovaz, Marc J. – Journal of Applied Behavior Analysis, 2022
Behavior analysts typically rely on visual inspection of single-case experimental designs to make treatment decisions. However, visual inspection is subjective, which has led to the development of supplemental objective methods such as the conservative dual-criteria method. To replicate and extend a study conducted by Wolfe et al. (2018) on the…
Descriptors: Visual Perception, Artificial Intelligence, Decision Making, Evaluators
Pinot de Moira, Anne; Wheadon, Christopher; Christodoulou, Daisy – Research in Education, 2022
Writing is generally assessed internationally using rubric-based approaches, but there is a growing body of evidence to suggest that the reliability of such approaches is poor. In contrast, comparative judgement studies suggest that it is possible to assess open ended tasks such as writing with greater reliability. Many previous studies, however,…
Descriptors: Writing Evaluation, Classification, Accuracy, Scoring Rubrics

Peer reviewed
Direct link
