Publication Date
| In 2026 | 0 |
| Since 2025 | 55 |
| Since 2022 (last 5 years) | 261 |
| Since 2017 (last 10 years) | 508 |
| Since 2007 (last 20 years) | 1258 |
Descriptor
| Evaluation Methods | 2743 |
| Test Reliability | 1408 |
| Test Validity | 991 |
| Reliability | 964 |
| Student Evaluation | 567 |
| Validity | 515 |
| Interrater Reliability | 502 |
| Foreign Countries | 444 |
| Test Construction | 364 |
| Higher Education | 359 |
| Measurement Techniques | 305 |
| More ▼ | |
Source
Author
| Raykov, Tenko | 9 |
| Epstein, Michael H. | 7 |
| Jaeger, Richard M. | 7 |
| Matson, Johnny L. | 7 |
| Amrein-Beardsley, Audrey | 6 |
| Follman, John | 6 |
| Gill, Brian | 6 |
| Gresham, Frank M. | 6 |
| Thompson, Bruce | 6 |
| Fink, Arlene | 5 |
| Marcoulides, George A. | 5 |
| More ▼ | |
Publication Type
Education Level
Audience
| Researchers | 137 |
| Practitioners | 99 |
| Teachers | 41 |
| Administrators | 32 |
| Policymakers | 17 |
| Students | 13 |
| Counselors | 5 |
| Support Staff | 3 |
| Community | 1 |
| Media Staff | 1 |
| Parents | 1 |
| More ▼ | |
Location
| Australia | 45 |
| United Kingdom | 41 |
| Canada | 31 |
| United Kingdom (England) | 29 |
| China | 28 |
| United States | 28 |
| Turkey | 27 |
| California | 22 |
| Florida | 21 |
| Netherlands | 19 |
| Israel | 16 |
| More ▼ | |
Laws, Policies, & Programs
Assessments and Surveys
What Works Clearinghouse Rating
| Does not meet standards | 1 |
Lucy Chambers; Sylvia Vitello; Carmen Vidal Rodeiro – Assessment in Education: Principles, Policy & Practice, 2024
In England, some secondary-level qualifications comprise non-exam assessments which need to undergo moderation before grading. Currently, moderation is conducted at centre (school) level. This raises challenges for maintaining the standard across centres. Recent technological advances enable novel moderation methods that are no longer bound by…
Descriptors: Foreign Countries, Evaluation Methods, Comparative Analysis, Grading
Scott H. Yamamoto – Journal of Psychoeducational Assessment, 2024
This was the first study in which a psychometrically validated STEM measure, the "Student STEM" (S-STEM), was studied for HSSWD. This study also represented the first time a psychometrically validated STEM measure, the "Student STEM" (S-STEM), was studied for HSSWD. Data were collected from 229 HSSWD in a western state and…
Descriptors: Psychometrics, STEM Education, Student Attitudes, High School Students
Hongyi Lin; Fengyan Wang – Journal of Psychoeducational Assessment, 2024
Accurate measurement of wisdom is the cornerstone of wisdom research. To provide a representative reference for the reliability level and moderating factors of various wisdom self-rating scales, we carried out a reliability generalization meta-analysis of Chinese and English references retrieved from 2004 to 2023. A total of 149 articles were…
Descriptors: Thinking Skills, Intelligence, Cognitive Psychology, Cognitive Measurement
Sümeyye Arkan; Sema Tan – International Journal of Assessment Tools in Education, 2025
Teachers' perceptions, attitudes, and opinions about students, curricula, or evaluation methods contribute to the development of students' talents. Thus, researchers often collect data from teachers to identify gifted students, determine educational practices to meet the students' needs and assess gifted education programs. Researchers often…
Descriptors: Talent Identification, Academically Gifted, Evaluation Methods, Measurement Techniques
Victoria Reynolds; Kristin Scavo-Smith; Kate Oteng-Bediako; Sophie Scanlon – International Journal of Language & Communication Disorders, 2025
Introduction: Running speech sampling is an essential component of a paediatric voice evaluation, in that it should provide the examiner with a representative vocal sample of the child's everyday voice use outside of the clinic setting. Current speech sampling practices, consisting of reading tasks, informal conversation sampling and the voice…
Descriptors: Allied Health Personnel, Speech Language Pathology, Children, Voice Disorders
Holcomb, T. Scott; Lambert, Richard; Bottoms, Bryndle L. – Journal of Educational Supervision, 2022
In this study, various statistical indexes of agreement were calculated using empirical data from a group of evaluators (n = 45) of early childhood teachers. The group of evaluators rated ten fictitious teacher profiles using the North Carolina Teacher Evaluation Process (NCTEP) rubric. The exact and adjacent agreement percentages were calculated…
Descriptors: Interrater Reliability, Teacher Evaluation, Statistical Analysis, Early Childhood Teachers
Wenjing Guo – ProQuest LLC, 2021
Constructed response (CR) items are widely used in large-scale testing programs, including the National Assessment of Educational Progress (NAEP) and many district and state-level assessments in the United States. One unique feature of CR items is that they depend on human raters to assess the quality of examinees' work. The judgment of human…
Descriptors: National Competency Tests, Responses, Interrater Reliability, Error of Measurement
Katie L. McDermott – ProQuest LLC, 2024
Nursing education programs are faced with urgent demands to transition to competency-based education (CBE) to address the limitations of the nursing workforce. The AACN (2021) has developed the Essentials, or the core competencies for graduating entry- and advanced-level nurses to inform CBE. A concept analysis of Foundational Competence was…
Descriptors: Job Skills, Employment Qualifications, Nurses, Nursing Education
Yuan Tian; Xi Yang; Suhail A. Doi; Luis Furuya-Kanamori; Lifeng Lin; Joey S. W. Kwong; Chang Xu – Research Synthesis Methods, 2024
RobotReviewer is a tool for automatically assessing the risk of bias in randomized controlled trials, but there is limited evidence of its reliability. We evaluated the agreement between RobotReviewer and humans regarding the risk of bias assessment based on 1955 randomized controlled trials. The risk of bias in these trials was assessed via two…
Descriptors: Risk, Randomized Controlled Trials, Classification, Robotics
Zhipeng Hou; Elizabeth Tipton – Research Synthesis Methods, 2024
Literature screening is the process of identifying all relevant records from a pool of candidate paper records in systematic review, meta-analysis, and other research synthesis tasks. This process is time consuming, expensive, and prone to human error. Screening prioritization methods attempt to help reviewers identify most relevant records while…
Descriptors: Meta Analysis, Research Reports, Identification, Evaluation Methods
Marjahan Begum; Pontus Haglund; Ari Korhonen; Violetta Lonati; Mattia Monga; Filip Strömbäck; Artturi Tilanterä – Informatics in Education, 2024
There can be many reasons why students fail to answer correctly to summative tests in advanced computer science courses: often the cause is a lack of prerequisites or misconceptions about topics presented in previous courses. One of the ITiCSE 2020 working groups investigated the possibility of designing assessments suitable for differentiating…
Descriptors: Foreign Countries, College Students, Prerequisites, Computer Science Education
Lisa DaVia Rubenstein; Kathrin Maki; Brianna Quigley; Shanyn Thompson; Lisa M. Ridgley Smith – AERA Online Paper Repository, 2024
The purpose of this systematic review was to survey available measures of creativity for pk12 students for assessments characteristics and reporting of psychometric properties. Using the PRISMA framework, we identified 42 unique articles with 48 assessments meeting our inclusion criteria. Then, two coders independently coded all articles using a…
Descriptors: Literature Reviews, Meta Analysis, Elementary Secondary Education, Creativity
Wallace N. Pinto Jr.; Jinnie Shin – Journal of Educational Measurement, 2025
In recent years, the application of explainability techniques to automated essay scoring and automated short-answer grading (ASAG) models, particularly those based on transformer architectures, has gained significant attention. However, the reliability and consistency of these techniques remain underexplored. This study systematically investigates…
Descriptors: Automation, Grading, Computer Assisted Testing, Scoring
Baraldi Cunha, Andrea; Babik, Iryna; Koziol, Natalie A.; Hsu, Lin-Ya; Nord, Jayden; Harbourne, Regina T.; Westcott-McCoy, Sarah; Dusing, Stacey C.; Bovaird, James A.; Lobo, Michele A. – Grantee Submission, 2021
Purpose: To evaluate the validity, reliability, and sensitivity of the novel Means-End Problem-Solving Assessment Tool (MEPSAT). Methods: Children with typical development and those with motor delay were assessed throughout the first 2 years of life using the MEPSAT. MEPSAT scores were validated against the cognitive and motor subscales of the…
Descriptors: Problem Solving, Early Intervention, Evaluation Methods, Motor Development
Darvishi, Ali; Khosravi, Hassan; Rahimi, Afshin; Sadiq, Shazia; Gasevic, Dragan – IEEE Transactions on Learning Technologies, 2023
Engaging students in creating learning resources has demonstrated pedagogical benefits. However, to effectively utilize a repository of student-generated content (SGC), a selection process is needed to separate high- from low-quality resources as some of the resources created by students can be ineffective, inappropriate, or incorrect. A common…
Descriptors: Student Developed Materials, Educational Assessment, Peer Evaluation, Evaluation Methods

Peer reviewed
Direct link
