Publication Date
| In 2026 | 0 |
| Since 2025 | 621 |
| Since 2022 (last 5 years) | 3121 |
| Since 2017 (last 10 years) | 7362 |
| Since 2007 (last 20 years) | 15000 |
Descriptor
| Test Reliability | 15006 |
| Test Validity | 10245 |
| Reliability | 9748 |
| Foreign Countries | 7119 |
| Test Construction | 4807 |
| Validity | 4189 |
| Measures (Individuals) | 3872 |
| Factor Analysis | 3820 |
| Psychometrics | 3513 |
| Interrater Reliability | 3117 |
| Correlation | 3037 |
| More ▼ | |
Source
Author
Publication Type
Education Level
Audience
| Researchers | 709 |
| Practitioners | 451 |
| Teachers | 208 |
| Administrators | 122 |
| Policymakers | 66 |
| Counselors | 42 |
| Students | 38 |
| Parents | 11 |
| Community | 7 |
| Support Staff | 6 |
| Media Staff | 5 |
| More ▼ | |
Location
| Turkey | 1319 |
| Australia | 436 |
| Canada | 379 |
| China | 368 |
| United States | 271 |
| United Kingdom | 256 |
| Indonesia | 249 |
| Taiwan | 234 |
| Netherlands | 223 |
| Spain | 216 |
| California | 214 |
| More ▼ | |
Laws, Policies, & Programs
Assessments and Surveys
What Works Clearinghouse Rating
| Meets WWC Standards without Reservations | 8 |
| Meets WWC Standards with or without Reservations | 9 |
| Does not meet standards | 6 |
Susan K. Johnsen – Gifted Child Today, 2025
The author provides information about reliability and areas that educators should examine in determining if an assessment is consistent and trustworthy for use, and how it should be interpreted in making decisions about students. Reliability areas that are discussed in the column include internal consistency, test-retest or stability, inter-scorer…
Descriptors: Test Reliability, Academically Gifted, Student Evaluation, Error of Measurement
Samuel D'Emanuele; Francesca Nardello; Fabrizio Garau; Diego Campaci; Federico Schena; Cantor Tarperi – Measurement in Physical Education and Exercise Science, 2025
The agreement between a wearable inertial sensor (GYKO, G) and the force platform (P) was assessed by evaluating "test-retest" and "inter-rater reliability." Thirty-eight subjects were enrolled; the selected indices of balance were investigated over foot positions and (un)stable conditions. Intraclass correlation coefficient…
Descriptors: Human Posture, Measurement Equipment, Interrater Reliability, Measurement Techniques
Hsin-Yun Lee; You-Lin Chen; Li-Jen Weng – Journal of Experimental Education, 2024
The second version of Kaiser's Measure of Sampling Adequacy (MSA[subscript 2]) has been widely applied to assess the factorability of data in psychological research. The MSA[subscript 2] is developed in the population and little is known about its behavior in finite samples. If estimated MSA[subscript 2]s are biased due to sampling errors,…
Descriptors: Error of Measurement, Reliability, Sampling, Statistical Bias
Mazin T. Alqhazo; Tha’er Al-Kadi; Firas S. Alfwaress – Language, Speech, and Hearing Services in Schools, 2025
Purpose: The Stuttering Severity Instrument--Fourth Edition (SSI-4) is unavailable in Arabic language. The purpose of the current research is to translate the SSI-4 (Riley, 2009) into Arabic and to discuss its validity, as well as its intrajudge and interjudge reliability. Method: Archived videos of 28 school-aged children who stutter ranged in…
Descriptors: Arabic, Translation, Test Validity, Test Reliability
Angus Kittelman; Sara Izzard; Kent McIntosh; Kelsey R. Morris; Timothy J. Lewis – Assessment for Effective Intervention, 2024
The purpose of this study was to evaluate the psychometric properties of the Self-Assessment Survey (SAS) 4.0, an updated measure assessing implementation fidelity of positive behavioral interventions and supports (PBIS). A total of 627 school personnel from 33 schools in six U.S. states completed the SAS 4.0 during the 2021-2022 school year. We…
Descriptors: Positive Behavior Supports, Teachers, Self Evaluation (Individuals), Test Reliability
Marcus Messer; Neil C. C. Brown; Michael Kölling; Miaojing Shi – ACM Transactions on Computing Education, 2025
Providing consistent summative assessment to students is important, as the grades they are awarded affect their progression through university and future career prospects. While small cohorts are typically assessed by a single assessor, such as the module/class leader, larger cohorts are often assessed by multiple assessors, typically teaching…
Descriptors: Foreign Countries, Grading, Interrater Reliability, Teaching Assistants
Jonas Flodén – British Educational Research Journal, 2025
This study compares how the generative AI (GenAI) large language model (LLM) ChatGPT performs in grading university exams compared to human teachers. Aspects investigated include consistency, large discrepancies and length of answer. Implications for higher education, including the role of teachers and ethics, are also discussed. Three…
Descriptors: College Faculty, Artificial Intelligence, Comparative Testing, Scoring
Matthew K. Burns; Heba Z. Abdelnaby; Jonie B. Welland; Katherine A. Graves; Kari Kurto – Assessment for Effective Intervention, 2024
The current study examined the reliability of The Reading League Curriculum-Evaluation Guidelines (CEGs), which were developed to help school-based teams rate the presence of red flags when considering adopting specific literacy curricula. Coders (n = 30) independently used the CEGs to evaluate a free online English language arts curriculum. The…
Descriptors: English Curriculum, English Instruction, Language Arts, Curriculum Evaluation
Enninga, Annemieke; Waninge, Aly; Post, Wendy J.; van der Putten, Annette A. J. – Journal of Applied Research in Intellectual Disabilities, 2023
Background: Persons with profound intellectual and multiple disabilities (PIMD) are vulnerable when it comes to experiencing pain. Reliable assessment of pain-related behaviour in these persons is difficult. "Aim" To determine how pain items can be reliably scored in adults with PIMD. Methods: We developed an instruction protocol for the…
Descriptors: Test Reliability, Pain, Behavior, Adults
Ichikowitz, Kerri; Bruce, Carolyn; Meitanis, Vanessa; Cheung, Kelly; Kim, Yekyung; Talbourdet, Esther; Newton, Caroline – International Journal of Language & Communication Disorders, 2023
Background: People with aphasia (PWA) can experience functional numeracy difficulties, that is, problems understanding or using numbers in everyday life, which can have numerous negative impacts on their daily lives. There is growing interest in designing functional numeracy interventions for PWA; however, there are limited suitable assessments…
Descriptors: Test Construction, Test Validity, Numeracy, Adults
Tahereh Firoozi; Hamid Mohammadi; Mark J. Gierl – Journal of Educational Measurement, 2025
The purpose of this study is to describe and evaluate a multilingual automated essay scoring (AES) system for grading essays in three languages. Two different sentence embedding models were evaluated within the AES system, multilingual BERT (mBERT) and language-agnostic BERT sentence embedding (LaBSE). German, Italian, and Czech essays were…
Descriptors: College Students, Slavic Languages, German, Italian
Chase Young; Benjamin Mitchell-Yellin; George Kevin Randall – Active Learning in Higher Education, 2025
The purpose of this study was to develop a valid, reliable, and brief measure of active learning in college classrooms that is cheap and easy to complete and yields results that faculty can easily use to inform their development as instructors. Initial construct and face validity was achieved by modifying existing instruments and creating a draft…
Descriptors: College Faculty, College Students, Active Learning, Classroom Observation Techniques
Morten Pallisgaard Støve; Mathias Kringelholt Kristensen; Jonas Nielsen; Lea Dyhrberg Madsen – Measurement in Physical Education and Exercise Science, 2025
Between limb strength, asymmetry is a leading risk factor for hamstring strain re-injury. However, few accurate testing methodologies are available in clinical settings. This study examined the validity and reliability of eccentric knee flexor torque measured with a novel Nordic Hamstring Device. Twenty-seven healthy participants were assessed in…
Descriptors: Validity, Reliability, Human Body, Foreign Countries
Aislinn Ganci; Miran Qazizada; Brianna Fehr; Ana Vucenovic; Edmond Lou; Eric Parent – Measurement in Physical Education and Exercise Science, 2024
Spinal alignment can be assessed without radiation using three-dimensional ultrasound imaging (3DUS). Reliable measurements could inform the ideal arm position for scoliosis radiographs. This study determined the inter-evaluator reliability of axial vertebral rotation (AVR) measurements and sagittal curve angles in healthy females from 3DUS spinal…
Descriptors: Foreign Countries, Young Adults, Adults, Adolescents
John R. Donoghue; Carol Eckerly – Applied Measurement in Education, 2024
Trend scoring constructed response items (i.e. rescoring Time A responses at Time B) gives rise to two-way data that follow a product multinomial distribution rather than the multinomial distribution that is usually assumed. Recent work has shown that the difference in sampling model can have profound negative effects on statistics usually used to…
Descriptors: Scoring, Error of Measurement, Reliability, Scoring Rubrics

Peer reviewed
Direct link
