Publication Date
| In 2026 | 0 |
| Since 2025 | 55 |
| Since 2022 (last 5 years) | 261 |
| Since 2017 (last 10 years) | 508 |
| Since 2007 (last 20 years) | 1258 |
Descriptor
| Evaluation Methods | 2743 |
| Test Reliability | 1408 |
| Test Validity | 991 |
| Reliability | 964 |
| Student Evaluation | 567 |
| Validity | 515 |
| Interrater Reliability | 502 |
| Foreign Countries | 444 |
| Test Construction | 364 |
| Higher Education | 359 |
| Measurement Techniques | 305 |
| More ▼ | |
Source
Author
| Raykov, Tenko | 9 |
| Epstein, Michael H. | 7 |
| Jaeger, Richard M. | 7 |
| Matson, Johnny L. | 7 |
| Amrein-Beardsley, Audrey | 6 |
| Follman, John | 6 |
| Gill, Brian | 6 |
| Gresham, Frank M. | 6 |
| Thompson, Bruce | 6 |
| Fink, Arlene | 5 |
| Marcoulides, George A. | 5 |
| More ▼ | |
Publication Type
Education Level
Audience
| Researchers | 137 |
| Practitioners | 99 |
| Teachers | 41 |
| Administrators | 32 |
| Policymakers | 17 |
| Students | 13 |
| Counselors | 5 |
| Support Staff | 3 |
| Community | 1 |
| Media Staff | 1 |
| Parents | 1 |
| More ▼ | |
Location
| Australia | 45 |
| United Kingdom | 41 |
| Canada | 31 |
| United Kingdom (England) | 29 |
| China | 28 |
| United States | 28 |
| Turkey | 27 |
| California | 22 |
| Florida | 21 |
| Netherlands | 19 |
| Israel | 16 |
| More ▼ | |
Laws, Policies, & Programs
Assessments and Surveys
What Works Clearinghouse Rating
| Does not meet standards | 1 |
Arielle Boguslav; Julie Cohen – Annenberg Institute for School Reform at Brown University, 2023
Teacher preparation programs are increasingly expected to use data on pre-service teacher (PST) skills to drive program improvement and provide targeted supports. Observational ratings are especially vital, but also prone to measurement issues. Scores may be influenced by factors unrelated to PSTs' instructional skills, including rater standards…
Descriptors: Preservice Teachers, Student Evaluation, Evaluation Methods, Preservice Teacher Education
Pinar Mihci Türker; Ömer Kirmaci; Emrah Kayabasi; Erinç Karatas; Ebru Kiliç Çakmak; Serçin Karatas – Journal of Educational Technology and Online Learning, 2024
The COVID-19 epidemic has precipitated a rapid and widespread adoption of online education, leading to its normalization in contemporary society. Online education is evident across several educational levels. However, assessing the efficacy and effectiveness of these training programs can only be achieved by implementing a suitable evaluation…
Descriptors: Online Courses, Distance Education, Evaluation Methods, Test Construction
Marine Simon; Alexandra Budke – Journal of Geography in Higher Education, 2024
Comparison is an important geographic method and a common task in geography education. Mastering comparison is a complex competency and written comparisons are challenging tasks both for students and assessors. As yet, however, there is no set test for evaluating comparison competency nor tool for enhancing it. Moreover, little is known about…
Descriptors: Geography Instruction, Student Evaluation, Comparative Analysis, Reliability
Huiying Cai; Xun Yan – Language Testing, 2024
Rater comments tend to be qualitatively analyzed to indicate raters' application of rating scales. This study applied natural language processing (NLP) techniques to quantify meaningful, behavioral information from a corpus of rater comments and triangulated that information with a many-facet Rasch measurement (MFRM) analysis of rater scores. The…
Descriptors: Natural Language Processing, Item Response Theory, Rating Scales, Writing Evaluation
Delia Leuenberger; Elisabeth Moser Opitz; Noemi Gloor – Journal of Numerical Cognition, 2024
Computation competence (CC) in simple addition and subtraction using non-counting (NC) strategies is an important learning objective in Grade 1 mathematics but many children, especially low achievers in mathematics, struggle to acquire these skills. To provide these students with the support they need, it is important to have valid and reliable…
Descriptors: Computation, Mathematics Skills, Addition, Subtraction
Chao Han; Binghan Zheng; Mingqing Xie; Shirong Chen – Interpreter and Translator Trainer, 2024
Human raters' assessment of interpreting is a complex process. Previous researchers have mainly relied on verbal reports to examine this process. To advance our understanding, we conducted an empirical study, collecting raters' eye-movement and retrospection data in a computerised interpreting assessment in which three groups of raters (n = 35)…
Descriptors: Foreign Countries, College Students, College Graduates, Interrater Reliability
Alison Cook-Sather; Ruth L. Healey – Teaching & Learning Inquiry, 2024
Peer review is widely accepted as critical to legitimating scholarly publication, and yet, it runs the risk of reproducing inequities in publishing processes and products. Acknowledging at once the historical need to legitimize SoTL publications, the current danger of reproducing exclusive practices, and the aspirational goal to "practice…
Descriptors: Peer Evaluation, Academic Language, Writing (Composition), Interrater Reliability
Rossin, Emily G.; Bergee, Martin J. – Journal of Research in Music Education, 2021
This is the sixth and culminating study in a series whose purpose has been to acquire a conceptual understanding of school band performance and to develop an assessment based on this understanding. With the present study, we cross-validated and applied a rating scale for school band performance. In the cross-validation phase, college students…
Descriptors: Music Education, Music Activities, Music, Performance
Pereira, Diana; Cadime, Irene; Brown, Gavin; Flores, Maria Assunção – European Journal of Higher Education, 2022
Drawing upon a wider piece of research, this paper focuses on the validation of a 'use of assessment' scale in five Portuguese public universities with 5549 students. The study aims to investigate the psychometric properties of the scale, to describe how students look at assessment uses, to analyse their utility perceptions of assessment, and to…
Descriptors: Undergraduate Students, Student Attitudes, Evaluation Methods, Foreign Countries
Sturgis, Paul W.; Marchand, Leslie; Miller, M. David; Xu, Wei; Castiglioni, Analia – Association for Institutional Research, 2022
This article introduces generalizability theory (G-theory) to institutional research and assessment practitioners, and explains how it can be utilized to evaluate the reliability of assessment procedures in order to improve student learning outcomes. The fundamental concepts associated with G-theory are briefly discussed, followed by a discussion…
Descriptors: Generalizability Theory, Institutional Research, Reliability, Computer Software
Ji, Xuejun Ryan; Wu, Amery D. – Educational Measurement: Issues and Practice, 2023
The Cross-Classified Mixed Effects Model (CCMEM) has been demonstrated to be a flexible framework for evaluating reliability by measurement specialists. Reliability can be estimated based on the variance components of the test scores. Built upon their accomplishment, this study extends the CCMEM to be used for evaluating validity evidence.…
Descriptors: Measurement, Validity, Reliability, Models
Mojgan Rashtchi; SeyyedeFateme Ghazi Mir Saeed – Sage Research Methods Cases, 2023
The reason for conducting the present case study was the problems the researchers encountered during data collection for another research project (Primary Study) entitled "The effects of virtual versus traditional flipped classes on EFL learners' grammar knowledge, self-regulation, and autonomy." Two online questionnaires were…
Descriptors: Data Collection, Questionnaires, Barriers, Research Methodology
Wolkowitz, Amanda A. – Journal of Educational Measurement, 2021
Decision consistency (DC) is the reliability of a classification decision based on a test score. In professional credentialing, the decision is often a high-stakes pass/fail decision. The current methods for estimating DC are computationally complex. The purpose of this research is to provide a computationally and conceptually simple method for…
Descriptors: Decision Making, Reliability, Classification, Scores
Tanya L. Eckert; Samantha C. Maguire; Kaytlin A. Nelson; Siani Y. M. Amidon; Alec R. Goldstein; Monique S. Antoine; Joshua J. Circe; Sophia V. Alderman; Tyler J. Young – Psychology in the Schools, 2024
The acceptability of school-based interventions has been regularly examined among teachers and parents; however, students are less frequently assessed despite being the recipients of most interventions. Although recent studies have begun exploring how different student characteristics (e.g., gender) and types of acceptability assessments (e.g.,…
Descriptors: Elementary School Students, Grade 3, Writing (Composition), Intervention
Regional Educational Laboratory Mid-Atlantic, 2024
These are the appendixes for the report, "Stabilizing School Performance Indicators in New Jersey to Reduce the Effect of Random Error." This study applied a stabilization model called Bayesian hierarchical modeling to group-level data (with groups assigned according to demographic designations) within schools in New Jersey with the aim…
Descriptors: Institutional Evaluation, Elementary Secondary Education, Bayesian Statistics, Test Reliability

Peer reviewed
Direct link
