Publication Date
| In 2026 | 0 |
| Since 2025 | 55 |
| Since 2022 (last 5 years) | 261 |
| Since 2017 (last 10 years) | 508 |
| Since 2007 (last 20 years) | 1258 |
Descriptor
| Evaluation Methods | 2743 |
| Test Reliability | 1408 |
| Test Validity | 991 |
| Reliability | 964 |
| Student Evaluation | 567 |
| Validity | 515 |
| Interrater Reliability | 502 |
| Foreign Countries | 444 |
| Test Construction | 364 |
| Higher Education | 359 |
| Measurement Techniques | 305 |
| More ▼ | |
Source
Author
| Raykov, Tenko | 9 |
| Epstein, Michael H. | 7 |
| Jaeger, Richard M. | 7 |
| Matson, Johnny L. | 7 |
| Amrein-Beardsley, Audrey | 6 |
| Follman, John | 6 |
| Gill, Brian | 6 |
| Gresham, Frank M. | 6 |
| Thompson, Bruce | 6 |
| Fink, Arlene | 5 |
| Marcoulides, George A. | 5 |
| More ▼ | |
Publication Type
Education Level
Audience
| Researchers | 137 |
| Practitioners | 99 |
| Teachers | 41 |
| Administrators | 32 |
| Policymakers | 17 |
| Students | 13 |
| Counselors | 5 |
| Support Staff | 3 |
| Community | 1 |
| Media Staff | 1 |
| Parents | 1 |
| More ▼ | |
Location
| Australia | 45 |
| United Kingdom | 41 |
| Canada | 31 |
| United Kingdom (England) | 29 |
| China | 28 |
| United States | 28 |
| Turkey | 27 |
| California | 22 |
| Florida | 21 |
| Netherlands | 19 |
| Israel | 16 |
| More ▼ | |
Laws, Policies, & Programs
Assessments and Surveys
What Works Clearinghouse Rating
| Does not meet standards | 1 |
Mohammad Hmoud; Hadeel Swaity; Eman Anjass; Eva María Aguaded-Ramírez – Electronic Journal of e-Learning, 2024
This research aimed to develop and validate a rubric to assess Artificial Intelligence (AI) chatbots' effectiveness in accomplishing tasks, particularly within educational contexts. Given the rapidly growing integration of AI in various sectors, including education, a systematic and robust tool for evaluating AI chatbot performance is essential.…
Descriptors: Artificial Intelligence, Man Machine Systems, Natural Language Processing, Test Construction
Juan M. Sanchez – Journal of Biological Education, 2024
Bias assessment (systematic errors) is fundamental in industry and service laboratories, where reliable results must be obtained to give correct answers to specific problems. Therefore, knowledge and practice in quality methodologies is of fundamental importance for students. Unfortunately, laboratory lessons often focus on connecting theory and…
Descriptors: Achievement Tests, Science Laboratories, Biology, Science Education
Simon Massey – International Journal of Social Research Methodology, 2024
The UK-based article develops a quantitative method for measuring 8-9-year-old children's Gender Ability Beliefs through drawings, assessing the reliability and validity of the measure and its association with respondents' self-reported gender. The measure, originally used in the US by Beilock et al. (2010), required respondents to draw two…
Descriptors: Children, Sex, Childrens Attitudes, Gender Differences
Weiwei Tong; Prasong Saihong; Kanyarat Sonsupap – International Journal of Language Education, 2024
The main objective of this study is to revise and validate the assessment of self-presentation skills of middle school students. The assessment is based on existing self-assessment scales and adaptively modified for a more accurate assessment of middle school students' self-presentation skills. Considering the characteristics of middle school…
Descriptors: Middle School Students, Self Evaluation (Individuals), Rating Scales, Reliability
Yuting Han; Zhehan Jiang; Lingling Xu; Fen Cai – AERA Online Paper Repository, 2024
To address the computational constraints of parameter estimation in the polytomous Cognitive Diagnosis Model (pCDM) in large-scale high data volume situations, this study proposes two two-stage polytomous attribute estimation methods: P_max and P_linear. The effects of the two-stage methods were studied via a Monte Carlo simulation study, and the…
Descriptors: Medical Education, Licensing Examinations (Professions), Measurement Techniques, Statistical Data
Lanah Stafford; Erin Cousins; Linda Bol; Megan Mize – Research & Practice in Assessment, 2023
Integrative learning is an important outcome for graduates of higher education. Therefore, it should be well-defined and assessed reliably. The American Association of Colleges & Universities has developed a rubric to define and assess integrative learning, but it has low reliability. This pilot study examines whether this rubric's reliability…
Descriptors: Scoring Rubrics, Reliability, Evaluation Methods, Faculty Development
Kimbell, Richard – International Journal of Technology and Design Education, 2022
Conventional approaches to assessment involve teachers and examiners judging the quality of learners work by reference to lists of criteria or other 'outcome' statements. This paper explores a quite different method of assessment using 'Adaptive Comparative Judgement' (ACJ) that was developed within a research project at Goldsmiths University of…
Descriptors: Student Evaluation, Evaluation Methods, Alternative Assessment, Value Judgment
Konstantinou, Ioannis Ch. – Open Journal for Educational Research, 2022
The purpose of this article is to review the literature on the issue of grading as a method and technique of expressing students' performance in terms of school reality. Initially, a growing concern about the role of assessment of student's performance in the learning and, generally, in the educational process, is highlighted. Subsequently, the…
Descriptors: Grading, Student Evaluation, Evaluation Methods, Performance Based Assessment
Peer Overmarking and Insufficient Diagnosticity: The Impact of the Rating Method for Peer Assessment
Van Meenen, Florence; Coertjens, Liesje; Van Nes, Marie-Claire; Verschuren, Franck – Advances in Health Sciences Education, 2022
The present study explores two rating methods for peer assessment (analytical rating using criteria and comparative judgement) in light of concurrent validity, reliability and insufficient diagnosticity (i.e. the degree to which substandard work is recognised by the peer raters). During a second-year undergraduate course, students wrote a one-page…
Descriptors: Evaluation Methods, Peer Evaluation, Accuracy, Evaluation Criteria
Delphine Franco; Ruben Vanderlinde; Martin Valcke – European Journal of Education, 2025
Complex competences, such as managing students' aggressive behaviour, are challenging to develop during teacher training. Recently, video-based simulations have been considered promising, yet suitable assessment instruments are limitedly available. This paper reports on the design and evaluation of a video-based assessment tool tailored to measure…
Descriptors: Preservice Teachers, Preservice Teacher Education, Student Behavior, Aggression
Kylie Gorney; Sandip Sinharay – Journal of Educational Measurement, 2025
Although there exists an extensive amount of research on subscores and their properties, limited research has been conducted on categorical subscores and their interpretations. In this paper, we focus on the claim of Feinberg and von Davier that categorical subscores are useful for remediation and instructional purposes. We investigate this claim…
Descriptors: Tests, Scores, Test Interpretation, Alternative Assessment
Amanda Timmerman; Vasiliki Totsika; Valerie Lye; Laura Crane; Audrey Linden; Elizabeth Pellicano – Autism: The International Journal of Research and Practice, 2025
Autistic people are more likely to have co-occurring mental health conditions compared to the general population, and mental health interventions have been identified as a top research priority by autistic people and the wider autism community. Autistic adults have also communicated that quality of life is the outcome that matters most to them in…
Descriptors: Adults, Autism Spectrum Disorders, Quality of Life, Randomized Controlled Trials
Breanne J. Byiers; Alyssa M. Merbler; Chantel C. Burkitt; Frank J. Symons – American Journal on Intellectual and Developmental Disabilities, 2025
Sleep problems are common in Rett syndrome and other neurogenetic syndromes. Actigraphy is a cost-effective, objective method for measuring sleep. Current guidelines require caregiver-reported bed and wake times to facilitate actigraphy data scoring. The current study examined missingness and consistency of caregiver-reported bed and wake times…
Descriptors: Sleep, Neurodevelopmental Disorders, Psychomotor Skills, Genetic Disorders
Caroline F. Rowland; Amy Bidgood; Gary Jones; Andrew Jessop; Paula Stinson; Julian M. Pine; Samantha Durrant; Michelle S. Peter – Language Learning, 2025
A strong predictor of children's language is performance on non-word repetition (NWR) tasks. However, the basis of this relationship remains unknown. Some suggest that NWR tasks measure phonological working memory, which then affects language growth. Others argue that children's knowledge of language/language experience affects NWR performance. A…
Descriptors: Vocabulary Development, Comparative Analysis, Computational Linguistics, Language Skills
Anna Kay Steadman – ProQuest LLC, 2023
The Performance Assessment and Evaluation System (PAES) is used by all major universities in the state of Utah to measure the effective teaching skills of preservice candidates as they progress through their teaching preparation program. The resulting ratings are used to make high-stakes decisions relating to course completion as well as…
Descriptors: Preservice Teachers, Student Evaluation, Teaching Skills, Elementary School Teachers

Peer reviewed
Direct link
