Publication Date
| In 2026 | 0 |
| Since 2025 | 621 |
| Since 2022 (last 5 years) | 3121 |
| Since 2017 (last 10 years) | 7362 |
| Since 2007 (last 20 years) | 15000 |
Descriptor
| Test Reliability | 15006 |
| Test Validity | 10245 |
| Reliability | 9748 |
| Foreign Countries | 7119 |
| Test Construction | 4807 |
| Validity | 4189 |
| Measures (Individuals) | 3872 |
| Factor Analysis | 3820 |
| Psychometrics | 3513 |
| Interrater Reliability | 3117 |
| Correlation | 3037 |
| More ▼ | |
Source
Author
Publication Type
Education Level
Audience
| Researchers | 709 |
| Practitioners | 451 |
| Teachers | 208 |
| Administrators | 122 |
| Policymakers | 66 |
| Counselors | 42 |
| Students | 38 |
| Parents | 11 |
| Community | 7 |
| Support Staff | 6 |
| Media Staff | 5 |
| More ▼ | |
Location
| Turkey | 1319 |
| Australia | 436 |
| Canada | 379 |
| China | 368 |
| United States | 271 |
| United Kingdom | 256 |
| Indonesia | 249 |
| Taiwan | 234 |
| Netherlands | 223 |
| Spain | 216 |
| California | 214 |
| More ▼ | |
Laws, Policies, & Programs
Assessments and Surveys
What Works Clearinghouse Rating
| Meets WWC Standards without Reservations | 8 |
| Meets WWC Standards with or without Reservations | 9 |
| Does not meet standards | 6 |
Barth, Philipp; Stadtmann, Georg – Journal of Creative Behavior, 2021
The "consensual assessment technique" (CAT) is a reliable and valid method to measure (product) creativity and often considered "the" gold standard of creativity assessment. The reliability measure traditionally applied in CAT studies--inter-rater reliability--cannot capture time-sampling error, which is a particular relevant…
Descriptors: Creativity, Creativity Tests, Test Reliability, Interrater Reliability
Arielle Boguslav; Julie Cohen – Journal of Teacher Education, 2024
Teacher preparation programs are increasingly expected to use data on preservice teacher (PST) skills to drive program improvement and provide targeted supports. Observational ratings are especially vital, but also prone to measurement issues. Scores may be influenced by factors unrelated to PSTs' instructional skills, including rater standards.…
Descriptors: Preservice Teachers, Measures (Individuals), Evaluation Problems, Teaching Skills
Bin Tan; Nour Armoush; Elisabetta Mazzullo; Okan Bulut; Mark J. Gierl – International Journal of Assessment Tools in Education, 2025
This study reviews existing research on the use of large language models (LLMs) for automatic item generation (AIG). We performed a comprehensive literature search across seven research databases, selected studies based on predefined criteria, and summarized 60 relevant studies that employed LLMs in the AIG process. We identified the most commonly…
Descriptors: Artificial Intelligence, Test Items, Automation, Test Format
Francesco Pace; Giulia Sciotto – International Journal for Educational and Vocational Guidance, 2025
In recent years, to better face university paths, the first approaches to the labor market, and then the actual university-to-work transition, university students are asked to have broader skills, such as the ability to network, to be involved in career-related issues, and to explore the characteristics of occupations as much as personal ones.…
Descriptors: Undergraduate Students, Questionnaires, Foreign Countries, Test Reliability
Sima Zach; Noa Fishler-Barum; Itamar Shidlov – Physical Educator, 2025
The purpose of the study was to develop the Teachers' Mental Toughness Questionnaire (TMTQ). The questionnaire was developed in six stages: item generation, content validity, exploratory factor analysis, reliability tests, convergent validity tests, and discriminant validity. The factor analysis indicates that it measures six factors: team,…
Descriptors: Test Construction, Test Validity, Test Reliability, Psychometrics
Hui Jin; Cynthia Lima; Limin Wang – Educational Measurement: Issues and Practice, 2025
Although AI transformer models have demonstrated notable capability in automated scoring, it is difficult to examine how and why these models fall short in scoring some responses. This study investigated how transformer models' language processing and quantification processes can be leveraged to enhance the accuracy of automated scoring. Automated…
Descriptors: Automation, Scoring, Artificial Intelligence, Accuracy
Conrad Borchers – International Educational Data Mining Society, 2025
Algorithmic bias is a pressing concern in educational data mining (EDM), as it risks amplifying inequities in learning outcomes. The Area Between ROC Curves (ABROCA) metric is frequently used to measure discrepancies in model performance across demographic groups to quantify overall model fairness. However, its skewed distribution--especially when…
Descriptors: Algorithms, Bias, Statistics, Simulation
Clarence Joldersma – Philosophical Studies in Education, 2025
In this paper, the author will develop a more comprehensive notion of truth, one that goes beyond the epistemological correspondence theory, and the author will argue for the importance of authentication as a crucial extension of truth, especially in a posttruth climate. Hannah Arendt observes, "facts need testimony to be remembered and…
Descriptors: Educational Philosophy, Educational Theories, Epistemology, Educational Practices
Beyza Aksu Dunya; Mehmet Can Demir; Stefanie Wind – Research & Practice in Assessment, 2025
This paper aims to synthesize measures of assessment literacy in higher education by forging a connection between two research domains: educational assessment and psychometrics. It begins with a systematic review of assessment literacy measures within the context of higher education published within the last ten years. AL measures, including tests…
Descriptors: Assessment Literacy, Higher Education, Measures (Individuals), Reliability
Constructing a Roadmap to Measure the Quality of Business Assessments Aimed at Curriculum Management
Silva, Thanuci; Santos, Regiane dos; Mallet, Débora – Journal of Education for Business, 2023
Assuring the quality of education is a concern of learning institutions. To do so, it is necessary to have assertive learning management, with consistent data on students' outcomes. This research provides associate deans and researchers, a roadmap with which to gather evidence to improve the quality of open-ended assessments. Based on statistical…
Descriptors: Student Evaluation, Evaluation Methods, Business Education, Higher Education
Riana Nurhayati; Suranto Aw; Siti Irene Astuti Dwiningrum; Mami Hajaroh; Herwin Herwin – International Journal of Educational Methodology, 2024
Evaluation of child-friendly school (CFS) policies is essential to determine the achievements of school efforts in reducing violence cases. This research aims to proving the reliability and validity of CFS policy evaluation instruments in elementary schools with different locations. This investigation uses the Context Input Process Product (CIPP)…
Descriptors: Validity, Reliability, School Policy, Program Evaluation
Swapneel Thite; Jayashri Ravishankar; Inmaculada Tomeo-Reyes; Araceli Martinez Ortiz – European Journal of Engineering Education, 2024
Effectively working in an engineering workplace requires strong teamwork skills, yet the existing literature within various disciplines reveals discrepancies in evaluating these skills. This complicates the design of a generic teamwork peer evaluation tool for engineering students. This study aims to address this gap by introducing the DRIVE…
Descriptors: Scoring Rubrics, Evaluation Methods, Peer Evaluation, Teamwork
Janice Kinghorn; Katherine McGuire; Bethany L. Miller; Aaron Zimmerman – Assessment Update, 2024
In this article, the authors share their reflections on how different experiences and paradigms have broadened their understanding of the work of assessment in higher education. As they collaborated to create a panel for the 2024 International Conference on Assessing Quality in Higher Education, they recognized that they, as assessment…
Descriptors: Higher Education, Assessment Literacy, Evaluation Criteria, Evaluation Methods
Shasha Chen; Shaohui Chi; Zuhao Wang – Journal of Baltic Science Education, 2025
Interdisciplinary thinking is critical for equipping students to apply scientific knowledge and tackle societal challenges across various disciplines, which has been recognized as a key objective of twenty-first century science education. However, research on effective interdisciplinary assessment in secondary school science education is still…
Descriptors: Thinking Skills, Interdisciplinary Approach, Science Instruction, Grade 7
Brittany Grey; Marren C. Brooks; Emily A. Lund; Krystal L. Werfel – Language, Speech, and Hearing Services in Schools, 2025
Purpose: This study examined the internal consistency reliability, interrater reliability, and concurrent validity of the norm-referenced Test of Early Written Language--Third Edition (TEWL-3) to determine if it is an appropriate measure to use when determining if elementary children who are deaf and hard of hearing (DHH) meet grade-level writing…
Descriptors: Hard of Hearing, Sensory Aids, Writing Improvement, Writing Instruction

Peer reviewed
Direct link
