Publication Date
In 2025 | 21 |
Since 2024 | 58 |
Since 2021 (last 5 years) | 105 |
Since 2016 (last 10 years) | 190 |
Since 2006 (last 20 years) | 362 |
Descriptor
Evaluation Methods | 716 |
Test Reliability | 716 |
Test Validity | 476 |
Student Evaluation | 156 |
Foreign Countries | 155 |
Test Construction | 147 |
Higher Education | 92 |
Psychometrics | 86 |
Measures (Individuals) | 69 |
Measurement Techniques | 68 |
Factor Analysis | 66 |
More ▼ |
Source
Author
Publication Type
Education Level
Location
Australia | 18 |
Canada | 15 |
United Kingdom | 13 |
China | 12 |
Turkey | 12 |
United States | 10 |
Netherlands | 7 |
California | 6 |
Indonesia | 6 |
Israel | 6 |
Taiwan | 6 |
More ▼ |
Laws, Policies, & Programs
Assessments and Surveys
What Works Clearinghouse Rating
Does not meet standards | 1 |
Guido Schwarzer; Gerta Rücker; Cristina Semaca – Research Synthesis Methods, 2024
The "LFK" index has been promoted as an improved method to detect bias in meta-analysis. Putatively, its performance does not depend on the number of studies in the meta-analysis. We conducted a simulation study, comparing the "LFK" index test to three standard tests for funnel plot asymmetry in settings with smaller or larger…
Descriptors: Bias, Meta Analysis, Simulation, Evaluation Methods
Melissa Raspa; Angela Gwaltney; Carla Bann; Jana von Hehn; Timothy A. Benke; Eric D. Marsh; Sarika U. Peters; Amitha Ananth; Alan K. Percy; Jeffrey L. Neul – Journal of Autism and Developmental Disorders, 2025
Rett syndrome is a severe neurodevelopmental disorder that affects about 1 in 10,000 females. Clinical trials of disease modifying therapies are on the rise, but there are few psychometrically sound caregiver-reported outcome measures available to assess treatment benefit. We report on a new caregiver-reported outcome measure, the Rett Caregiver…
Descriptors: Neurodevelopmental Disorders, Genetic Disorders, Females, Test Validity
Thompson, W. Jake; Nash, Brooke; Clark, Amy K.; Hoover, Jeffrey C. – Journal of Educational Measurement, 2023
As diagnostic classification models become more widely used in large-scale operational assessments, we must give consideration to the methods for estimating and reporting reliability. Researchers must explore alternatives to traditional reliability methods that are consistent with the design, scoring, and reporting levels of diagnostic assessment…
Descriptors: Diagnostic Tests, Simulation, Test Reliability, Accuracy
Siti Suprihatiningsih; Masriyah; Rooselyna Ekawati – Journal of Education and Learning (EduLearn), 2025
The knowledge of the materials to be taught to the students is the basic knowledge that preservice mathematics teachers should possess, as they need to prepare themselves for teaching. In order to research preservice teachers' understanding of the subject matter and teaching skils, valid and reliable test instruments are required. Knowledge of…
Descriptors: Preservice Teachers, Pedagogical Content Knowledge, Preservice Teacher Education, Mathematics Teachers
Mojtaba Elhami Athar; Randall T. Salekin; Mahdi Hassanabadi; Parnian Rezaei; Golnoush Fakhr; Elham Zamani – Child & Youth Care Forum, 2025
The Proposed Specifiers for Conduct Disorder (PSCD) assesses psychopathy components of grandiose-manipulative (GM), callous-unemotional (CU), daring-impulsive (DI), and conduct disorder (CD). Research on PSCD is still in its infancy, and further research is necessary to examine its psychometric properties. We investigated the correlations between…
Descriptors: Preadolescents, Adolescents, Psychopathology, Behavior Disorders
Dirk Gellermann; Hanno Michel; Ute Harms – Mind, Brain, and Education, 2025
In order for climate literacy assessments to be applicable in large-scale studies, it is essential that they comply with the standards of test administration while maintaining consistency with a comprehensive definition of the concept. In alignment with the different educational frameworks and the Climate Literacy Principles of the U.S. Global…
Descriptors: Climate, Environmental Education, Literacy, Measures (Individuals)
Madeline A. Schellman; Matthew J. Madison – Grantee Submission, 2024
Diagnostic classification models (DCMs) have grown in popularity as stakeholders increasingly desire actionable information related to students' skill competencies. Longitudinal DCMs offer a psychometric framework for providing estimates of students' proficiency status transitions over time. For both cross-sectional and longitudinal DCMs, it is…
Descriptors: Diagnostic Tests, Classification, Models, Psychometrics
Marjahan Begum; Pontus Haglund; Ari Korhonen; Violetta Lonati; Mattia Monga; Filip Strömbäck; Artturi Tilanterä – Informatics in Education, 2024
There can be many reasons why students fail to answer correctly to summative tests in advanced computer science courses: often the cause is a lack of prerequisites or misconceptions about topics presented in previous courses. One of the ITiCSE 2020 working groups investigated the possibility of designing assessments suitable for differentiating…
Descriptors: Foreign Countries, College Students, Prerequisites, Computer Science Education
Wallace N. Pinto Jr.; Jinnie Shin – Journal of Educational Measurement, 2025
In recent years, the application of explainability techniques to automated essay scoring and automated short-answer grading (ASAG) models, particularly those based on transformer architectures, has gained significant attention. However, the reliability and consistency of these techniques remain underexplored. This study systematically investigates…
Descriptors: Automation, Grading, Computer Assisted Testing, Scoring
Kogar, Hakan – International Journal of Assessment Tools in Education, 2022
The purpose of this study is to identify which scale short-form development method produces better findings in different factor structures. A simulation study was designed based on this purpose. Three different factor structures and three simulation conditions were selected. As the findings of this simulation study, the model-data fit and…
Descriptors: Test Construction, Measures (Individuals), Factor Structure, Test Reliability
Tahereh Firoozi; Hamid Mohammadi; Mark J. Gierl – Journal of Educational Measurement, 2025
The purpose of this study is to describe and evaluate a multilingual automated essay scoring (AES) system for grading essays in three languages. Two different sentence embedding models were evaluated within the AES system, multilingual BERT (mBERT) and language-agnostic BERT sentence embedding (LaBSE). German, Italian, and Czech essays were…
Descriptors: College Students, Slavic Languages, German, Italian
Kim, Mi Song – International Journal of Technology and Design Education, 2022
Teacher design work has gained increasing attention by re-conceptualizing teachers as designers rather than curriculum deliverers. However, assessing teacher design work can be challenging given that there are very few research tools to assess teacher design knowledge (TDK) competencies. To fill that gap, this study proposes a survey that assesses…
Descriptors: Design, Teacher Characteristics, Teacher Competencies, Teacher Evaluation
Nicole D. Martin; Stephanie N. Baker; Madeline Haynes; Jayce R. Warner – Computer Science Education, 2024
Background and Context: As computer science (CS) education expands and the need for well-prepared CS teachers grows, understanding what motivates teachers to teach CS can help address challenges to recruiting, preparing, and retaining teachers. Objective: The goal of this work was to develop and validate a scale that measures teachers' motivation…
Descriptors: Computer Science Education, Teacher Motivation, Measurement Techniques, Construct Validity
Kazuya Saito; Adam Tierney – Studies in Second Language Acquisition, 2024
This article proposes a conceptual and measurement framework for postpubertal, L2 speech learning aptitude that is centered around domain-general auditory processing (i.e., representing spectral and temporal characteristics of sounds). To this end, we examine the construct and reliability of a battery of auditory processing tests by presenting the…
Descriptors: Second Language Learning, Auditory Tests, Auditory Perception, Listening Comprehension Tests
Bang Quan Zheng; Peter M. Bentler – Structural Equation Modeling: A Multidisciplinary Journal, 2025
This paper aims to advocate for a balanced approach to model fit evaluation in structural equation modeling (SEM). The ongoing debate surrounding chi-square test statistics and fit indices has been characterized by ambiguity and controversy. Despite the acknowledged limitations of relying solely on the chi-square test, its careful application can…
Descriptors: Monte Carlo Methods, Structural Equation Models, Goodness of Fit, Robustness (Statistics)