Publication Date
| In 2026 | 3 |
| Since 2025 | 656 |
| Since 2022 (last 5 years) | 3157 |
| Since 2017 (last 10 years) | 7398 |
| Since 2007 (last 20 years) | 15036 |
Descriptor
| Test Reliability | 15028 |
| Test Validity | 10265 |
| Reliability | 9757 |
| Foreign Countries | 7137 |
| Test Construction | 4821 |
| Validity | 4191 |
| Measures (Individuals) | 3876 |
| Factor Analysis | 3822 |
| Psychometrics | 3520 |
| Interrater Reliability | 3124 |
| Correlation | 3039 |
| More ▼ | |
Source
Author
Publication Type
Education Level
Audience
| Researchers | 709 |
| Practitioners | 451 |
| Teachers | 208 |
| Administrators | 122 |
| Policymakers | 66 |
| Counselors | 42 |
| Students | 38 |
| Parents | 11 |
| Community | 7 |
| Support Staff | 6 |
| Media Staff | 5 |
| More ▼ | |
Location
| Turkey | 1326 |
| Australia | 436 |
| Canada | 379 |
| China | 368 |
| United States | 271 |
| United Kingdom | 256 |
| Indonesia | 251 |
| Taiwan | 234 |
| Netherlands | 223 |
| Spain | 216 |
| California | 214 |
| More ▼ | |
Laws, Policies, & Programs
Assessments and Surveys
What Works Clearinghouse Rating
| Meets WWC Standards without Reservations | 8 |
| Meets WWC Standards with or without Reservations | 9 |
| Does not meet standards | 6 |
Julia Brochey-Taylor; Joseph A. Taylor – Educational Research and Reviews, 2024
The purpose of this synthesis study was to assess the reliability and validity of the Draw-A-Scientist Test (DAST) and its variations across multiple studies, aiming to understand limitations and propose modifications for future application within and beyond the science domain. Given the existence of multiple DAST versions, this study quantified…
Descriptors: Cognitive Tests, Freehand Drawing, Personality Measures, Projective Measures
Elizabeth Choi-Tucci; John Sideris; Cristin Holland; Grace T. Baranek; Linda R. Watson – Journal of Speech, Language, and Hearing Research, 2025
Purpose: Intentional communication acts, or purposefully directed vocalizations and gestures, are particularly difficult for infants at elevated likelihood for eventual diagnosis of autism. The ability to measure and track intentional communication in infancy thus has the potential to aid early identification and intervention efforts. This study…
Descriptors: Infants, Autism Spectrum Disorders, Caregiver Child Relationship, Nonverbal Communication
Barth, Philipp; Stadtmann, Georg – Journal of Creative Behavior, 2021
The "consensual assessment technique" (CAT) is a reliable and valid method to measure (product) creativity and often considered "the" gold standard of creativity assessment. The reliability measure traditionally applied in CAT studies--inter-rater reliability--cannot capture time-sampling error, which is a particular relevant…
Descriptors: Creativity, Creativity Tests, Test Reliability, Interrater Reliability
Arielle Boguslav; Julie Cohen – Journal of Teacher Education, 2024
Teacher preparation programs are increasingly expected to use data on preservice teacher (PST) skills to drive program improvement and provide targeted supports. Observational ratings are especially vital, but also prone to measurement issues. Scores may be influenced by factors unrelated to PSTs' instructional skills, including rater standards.…
Descriptors: Preservice Teachers, Measures (Individuals), Evaluation Problems, Teaching Skills
Bin Tan; Nour Armoush; Elisabetta Mazzullo; Okan Bulut; Mark J. Gierl – International Journal of Assessment Tools in Education, 2025
This study reviews existing research on the use of large language models (LLMs) for automatic item generation (AIG). We performed a comprehensive literature search across seven research databases, selected studies based on predefined criteria, and summarized 60 relevant studies that employed LLMs in the AIG process. We identified the most commonly…
Descriptors: Artificial Intelligence, Test Items, Automation, Test Format
Francesco Pace; Giulia Sciotto – International Journal for Educational and Vocational Guidance, 2025
In recent years, to better face university paths, the first approaches to the labor market, and then the actual university-to-work transition, university students are asked to have broader skills, such as the ability to network, to be involved in career-related issues, and to explore the characteristics of occupations as much as personal ones.…
Descriptors: Undergraduate Students, Questionnaires, Foreign Countries, Test Reliability
Sima Zach; Noa Fishler-Barum; Itamar Shidlov – Physical Educator, 2025
The purpose of the study was to develop the Teachers' Mental Toughness Questionnaire (TMTQ). The questionnaire was developed in six stages: item generation, content validity, exploratory factor analysis, reliability tests, convergent validity tests, and discriminant validity. The factor analysis indicates that it measures six factors: team,…
Descriptors: Test Construction, Test Validity, Test Reliability, Psychometrics
Hui Jin; Cynthia Lima; Limin Wang – Educational Measurement: Issues and Practice, 2025
Although AI transformer models have demonstrated notable capability in automated scoring, it is difficult to examine how and why these models fall short in scoring some responses. This study investigated how transformer models' language processing and quantification processes can be leveraged to enhance the accuracy of automated scoring. Automated…
Descriptors: Automation, Scoring, Artificial Intelligence, Accuracy
Conrad Borchers – International Educational Data Mining Society, 2025
Algorithmic bias is a pressing concern in educational data mining (EDM), as it risks amplifying inequities in learning outcomes. The Area Between ROC Curves (ABROCA) metric is frequently used to measure discrepancies in model performance across demographic groups to quantify overall model fairness. However, its skewed distribution--especially when…
Descriptors: Algorithms, Bias, Statistics, Simulation
Clarence Joldersma – Philosophical Studies in Education, 2025
In this paper, the author will develop a more comprehensive notion of truth, one that goes beyond the epistemological correspondence theory, and the author will argue for the importance of authentication as a crucial extension of truth, especially in a posttruth climate. Hannah Arendt observes, "facts need testimony to be remembered and…
Descriptors: Educational Philosophy, Educational Theories, Epistemology, Educational Practices
Beyza Aksu Dunya; Mehmet Can Demir; Stefanie Wind – Research & Practice in Assessment, 2025
This paper aims to synthesize measures of assessment literacy in higher education by forging a connection between two research domains: educational assessment and psychometrics. It begins with a systematic review of assessment literacy measures within the context of higher education published within the last ten years. AL measures, including tests…
Descriptors: Assessment Literacy, Higher Education, Measures (Individuals), Reliability
Huan Liu; Won-Chan Lee – Journal of Educational Measurement, 2025
This study investigates the estimation of classification consistency and accuracy indices for composite summed and theta scores within the SS-MIRT framework, using five popular approaches, including the Lee, Rudner, Guo, Bayesian EAP, and Bayesian MCMC approaches. The procedures are illustrated through analysis of two real datasets and further…
Descriptors: Classification, Reliability, Accuracy, Item Response Theory
Tim Moses; YoungKoung Kim – Journal of Educational Measurement, 2025
This study considers the estimation of marginal reliability and conditional accuracy measures using a generalized recursion procedure with several IRT-based ability and score estimators. The estimators include MLE, TCC, and EAP abilities, and corresponding test scores obtained with different weightings of the item scores. We consider reliability…
Descriptors: Item Response Theory, Scoring, Reliability, Accuracy
Constructing a Roadmap to Measure the Quality of Business Assessments Aimed at Curriculum Management
Silva, Thanuci; Santos, Regiane dos; Mallet, Débora – Journal of Education for Business, 2023
Assuring the quality of education is a concern of learning institutions. To do so, it is necessary to have assertive learning management, with consistent data on students' outcomes. This research provides associate deans and researchers, a roadmap with which to gather evidence to improve the quality of open-ended assessments. Based on statistical…
Descriptors: Student Evaluation, Evaluation Methods, Business Education, Higher Education
Riana Nurhayati; Suranto Aw; Siti Irene Astuti Dwiningrum; Mami Hajaroh; Herwin Herwin – International Journal of Educational Methodology, 2024
Evaluation of child-friendly school (CFS) policies is essential to determine the achievements of school efforts in reducing violence cases. This research aims to proving the reliability and validity of CFS policy evaluation instruments in elementary schools with different locations. This investigation uses the Context Input Process Product (CIPP)…
Descriptors: Validity, Reliability, School Policy, Program Evaluation

Peer reviewed
Direct link
