Publication Date
| In 2026 | 0 |
| Since 2025 | 55 |
| Since 2022 (last 5 years) | 261 |
| Since 2017 (last 10 years) | 508 |
| Since 2007 (last 20 years) | 1258 |
Descriptor
| Evaluation Methods | 2743 |
| Test Reliability | 1408 |
| Test Validity | 991 |
| Reliability | 964 |
| Student Evaluation | 567 |
| Validity | 515 |
| Interrater Reliability | 502 |
| Foreign Countries | 444 |
| Test Construction | 364 |
| Higher Education | 359 |
| Measurement Techniques | 305 |
| More ▼ | |
Source
Author
| Raykov, Tenko | 9 |
| Epstein, Michael H. | 7 |
| Jaeger, Richard M. | 7 |
| Matson, Johnny L. | 7 |
| Amrein-Beardsley, Audrey | 6 |
| Follman, John | 6 |
| Gill, Brian | 6 |
| Gresham, Frank M. | 6 |
| Thompson, Bruce | 6 |
| Fink, Arlene | 5 |
| Marcoulides, George A. | 5 |
| More ▼ | |
Publication Type
Education Level
Audience
| Researchers | 137 |
| Practitioners | 99 |
| Teachers | 41 |
| Administrators | 32 |
| Policymakers | 17 |
| Students | 13 |
| Counselors | 5 |
| Support Staff | 3 |
| Community | 1 |
| Media Staff | 1 |
| Parents | 1 |
| More ▼ | |
Location
| Australia | 45 |
| United Kingdom | 41 |
| Canada | 31 |
| United Kingdom (England) | 29 |
| China | 28 |
| United States | 28 |
| Turkey | 27 |
| California | 22 |
| Florida | 21 |
| Netherlands | 19 |
| Israel | 16 |
| More ▼ | |
Laws, Policies, & Programs
Assessments and Surveys
What Works Clearinghouse Rating
| Does not meet standards | 1 |
Guy B. deBrun – Journal of Outdoor Recreation, Education, and Leadership, 2025
Discussions of what it means to be an effective outdoor leader are common in outdoor education literature (Martin et al., 2025; Smith, 2021). Research has identified core competencies (Martin et al., 2025), conceptual frameworks (Pomfret et al., 2023), and course curricula/qualifications for effective leadership (Baker & O'Brien, 2019; Seaman…
Descriptors: Outdoor Leadership, Leadership Effectiveness, Evaluation Methods, Scoring Rubrics
Russell P. Houpt; Kevin J. Grimm; Aaron T. McLaughlin; Daryl R. Van Tongeren – Structural Equation Modeling: A Multidisciplinary Journal, 2024
Numerous methods exist to determine the optimal number of classes when using latent profile analysis (LPA), but none are consistently correct. Recently, the likelihood incremental percentage per parameter (LI3P) was proposed as a model effect-size measure. To evaluate the LI3P more thoroughly, we simulated 50,000 datasets, manipulating factors…
Descriptors: Structural Equation Models, Profiles, Sample Size, Evaluation Methods
Luu, Kimberly; Sidhu, Ravi; Chadha, Neil K.; Eva, Kevin W. – Advances in Health Sciences Education, 2023
Clinical supervisors are known to assess trainee performance idiosyncratically, causing concern about the validity of their ratings. The literature on this issue relies heavily on retrospective collection of decisions, resulting in the risk of inaccurate information regarding what actually drives raters' perceptions. Capturing in-the-moment…
Descriptors: Clinical Experience, Practicum Supervision, Student Evaluation, Evaluation Methods
Tavares, Walter; Kinnear, Benjamin; Schumacher, Daniel J.; Forte, Milena – Advances in Health Sciences Education, 2023
In this perspective, the authors critically examine "rater training" as it has been conceptualized and used in medical education. By "rater training," they mean the educational events intended to "improve" rater performance and contributions during assessment events. Historically, rater training programs have focused…
Descriptors: Medical Education, Interrater Reliability, Evaluation Methods, Training
Practices in Instrument Use and Development in "Chemistry Education Research and Practice" 2010-2021
Lazenby, Katherine; Tenney, Kristin; Marcroft, Tina A.; Komperda, Regis – Chemistry Education Research and Practice, 2023
Assessment instruments that generate quantitative data on attributes (cognitive, affective, behavioral, "etc.") of participants are commonly used in the chemistry education community to draw conclusions in research studies or inform practice. Recently, articles and editorials have stressed the importance of providing evidence for the…
Descriptors: Chemistry, Periodicals, Journal Articles, Science Education
Kelvin Terrell Pompey – ProQuest LLC, 2021
Many methods are used to measure interrater reliability for studies where each target receives ratings by a different set of judges. The purpose of this study is to explore the use of hierarchical modeling for estimating interrater reliability using the intraclass correlation coefficient. This study provides a description of how the ICC can be…
Descriptors: Interrater Reliability, Evaluation Methods, Test Reliability, Correlation
Joyce M. W. Moonen-van Loon; Jeroen Donkers – Practical Assessment, Research & Evaluation, 2025
The reliability of assessment tools is critical for accurately monitoring student performance in various educational contexts. When multiple assessments are combined to form an overall evaluation, each assessment serves as a data point contributing to the student's performance within a broader educational framework. Determining composite…
Descriptors: Programming Languages, Reliability, Evaluation Methods, Student Evaluation
Tenko Raykov; Bingsheng Zhang – Structural Equation Modeling: A Multidisciplinary Journal, 2024
Multidimensional measuring instruments are often used in behavioral, social, educational, marketing, and biomedical research. For these scales, the paper discusses how to find the optimal score based on their components that is associated with the highest possible reliability. Within the framework of structural equation modeling, an approach to…
Descriptors: Multidimensional Scaling, Measurement Equipment, Measurement Techniques, Test Reliability
Melissa Raspa; Angela Gwaltney; Carla Bann; Jana von Hehn; Timothy A. Benke; Eric D. Marsh; Sarika U. Peters; Amitha Ananth; Alan K. Percy; Jeffrey L. Neul – Journal of Autism and Developmental Disorders, 2025
Rett syndrome is a severe neurodevelopmental disorder that affects about 1 in 10,000 females. Clinical trials of disease modifying therapies are on the rise, but there are few psychometrically sound caregiver-reported outcome measures available to assess treatment benefit. We report on a new caregiver-reported outcome measure, the Rett Caregiver…
Descriptors: Neurodevelopmental Disorders, Genetic Disorders, Females, Test Validity
Dawn Holford; Janet McLean; Alex O. Holcombe; Iratxe Puebla; Vera Kempe – Active Learning in Higher Education, 2025
Authentic assessment allows students to demonstrate knowledge and skills in real-world tasks. In research, peer review is one such task that researchers learn by doing, as they evaluate other researchers' work. This means peer review could serve as an authentic assessment that engages students' critical thinking skills in a process of active…
Descriptors: Undergraduate Students, Evaluation Methods, Peer Evaluation, Interrater Reliability
Qiong Wu; Liping Gu – Sociological Methods & Research, 2024
Family income questions in general purpose surveys are usually collected with either a single-question summary design or a multiple-question disaggregation design. It is unclear how estimates from the two approaches agree with each other. The current paper takes advantage of a large-scale survey that has collected family income with both methods.…
Descriptors: Foreign Countries, Family Income, Questionnaires, Research Design
Elayne P. Colón; Lori M. Dassa; Thomas M. Dana; Nathan P. Hanson – Action in Teacher Education, 2024
To meet accreditation expectations, teacher preparation programs must demonstrate their candidates are evaluated using summative assessment tools that yield sound, reliable, and valid data. These tools are primarily used by the clinical experience team -- university supervisors and mentor teachers. Institutional beliefs regarding best practices…
Descriptors: Student Teachers, Teacher Interns, Evaluation Methods, Interrater Reliability
Fajetta M. Banks – ProQuest LLC, 2024
This study, grounded in a phenomenological exploration, investigates whether current teacher evaluation methods account for subjectivism in teaching, learning, and evaluation within the context of Georgia's Teacher Keys Effectiveness System TKES. Through focus groups with instructional evaluators IEs, the study reveals the significant impact of…
Descriptors: Teacher Evaluation, Evaluation Methods, Bias, Phenomenology
Thompson, W. Jake; Nash, Brooke; Clark, Amy K.; Hoover, Jeffrey C. – Journal of Educational Measurement, 2023
As diagnostic classification models become more widely used in large-scale operational assessments, we must give consideration to the methods for estimating and reporting reliability. Researchers must explore alternatives to traditional reliability methods that are consistent with the design, scoring, and reporting levels of diagnostic assessment…
Descriptors: Diagnostic Tests, Simulation, Test Reliability, Accuracy
Siti Suprihatiningsih; Masriyah; Rooselyna Ekawati – Journal of Education and Learning (EduLearn), 2025
The knowledge of the materials to be taught to the students is the basic knowledge that preservice mathematics teachers should possess, as they need to prepare themselves for teaching. In order to research preservice teachers' understanding of the subject matter and teaching skils, valid and reliable test instruments are required. Knowledge of…
Descriptors: Preservice Teachers, Pedagogical Content Knowledge, Preservice Teacher Education, Mathematics Teachers

Peer reviewed
Direct link
