Publication Date
| In 2026 | 0 |
| Since 2025 | 12 |
| Since 2022 (last 5 years) | 114 |
| Since 2017 (last 10 years) | 375 |
| Since 2007 (last 20 years) | 1130 |
Descriptor
| Comparative Analysis | 1943 |
| Reliability | 880 |
| Test Reliability | 792 |
| Foreign Countries | 554 |
| Test Validity | 443 |
| Correlation | 350 |
| Validity | 332 |
| Interrater Reliability | 327 |
| Statistical Analysis | 321 |
| Scores | 280 |
| Measures (Individuals) | 236 |
| More ▼ | |
Source
Author
| Reckase, Mark D. | 6 |
| Attali, Yigal | 5 |
| Coniam, David | 5 |
| Brennan, Robert L. | 4 |
| Crehan, Kevin D. | 4 |
| Feldt, Leonard S. | 4 |
| Hakstian, A. Ralph | 4 |
| Jones, Ian | 4 |
| Kolen, Michael J. | 4 |
| Lunz, Mary E. | 4 |
| August, Diane | 3 |
| More ▼ | |
Publication Type
Education Level
Audience
| Researchers | 35 |
| Practitioners | 29 |
| Teachers | 15 |
| Administrators | 9 |
| Policymakers | 6 |
| Counselors | 2 |
| Media Staff | 2 |
| Parents | 1 |
| Support Staff | 1 |
Location
| Turkey | 59 |
| United States | 47 |
| Australia | 36 |
| China | 33 |
| Canada | 32 |
| United Kingdom (England) | 32 |
| United Kingdom | 28 |
| Germany | 25 |
| Netherlands | 24 |
| Taiwan | 22 |
| Hong Kong | 20 |
| More ▼ | |
Laws, Policies, & Programs
Assessments and Surveys
What Works Clearinghouse Rating
| Meets WWC Standards with or without Reservations | 1 |
| Does not meet standards | 1 |
Daniel, Michael; Koshevoy, Alexey; Schurov, Ilya; Dobrushina, Nina – Field Methods, 2022
In this article, we address the issue of reliability of quantitative data on multilingualism of the past obtained as recall data. More specifically, we investigate whether the interviewees' assessments of the language repertoires of their late relatives (indirect data) provide results that are quantitatively similar to those obtained from the…
Descriptors: Recall (Psychology), Multilingualism, Artificial Intelligence, Second Languages
Arslan Mancar, Sinem; Gulleroglu, H. Deniz – International Journal of Assessment Tools in Education, 2022
The aim of this study is to analyse the importance of the number of raters and compare the results obtained by techniques based on Classical Test Theory (CTT) and Generalizability (G) Theory. The Kappa and Krippendorff alpha techniques based on CTT were used to determine the inter-rater reliability. In this descriptive research data consists of…
Descriptors: Comparative Analysis, Interrater Reliability, Advanced Placement, Scoring Rubrics
Christopher E. Gomez; Marcelo O. Sztainberg; Rachel E. Trana – International Journal of Bullying Prevention, 2022
Cyberbullying is the use of digital communication tools and spaces to inflict physical, mental, or emotional distress. This serious form of aggression is frequently targeted at, but not limited to, vulnerable populations. A common problem when creating machine learning models to identify cyberbullying is the availability of accurately annotated,…
Descriptors: Video Technology, Computer Software, Computer Mediated Communication, Bullying
A Comparison of Procedures for Estimating Person Reliability Parameters in the Graded Response Model
LaHuis, David M.; Bryant-Lees, Kinsey B.; Hakoyama, Shotaro; Barnes, Tyler; Wiemann, Andrea – Journal of Educational Measurement, 2018
Person reliability parameters (PRPs) model temporary changes in individuals' attribute level perceptions when responding to self-report items (higher levels of PRPs represent less fluctuation). PRPs could be useful in measuring careless responding and traitedness. However, it is unclear how well current procedures for estimating PRPs can recover…
Descriptors: Comparative Analysis, Reliability, Error of Measurement, Measurement Techniques
Bartelink, Cora; de Kwaadsteniet, Leontien; ten Berge, Ingrid J.; Witteman, Cilia L. M. – Child & Youth Care Forum, 2017
Background: The LIRIK, an instrument for the assessment of child safety and risk, is designed to improve assessments by guiding professionals through a structured evaluation of relevant signs, risk factors, and protective factors. Objective: We aimed to assess the interrater agreement and the predictive validity of professionals' judgments made…
Descriptors: Child Safety, Test Validity, Test Reliability, Risk
Wind, Stefanie A.; Walker, A. Adrienne – Educational Measurement: Issues and Practice, 2021
Many large-scale performance assessments include score resolution procedures for resolving discrepancies in rater judgments. The goal of score resolution is conceptually similar to person fit analyses: To identify students for whom observed scores may not accurately reflect their achievement. Previously, researchers have observed that…
Descriptors: Goodness of Fit, Performance Based Assessment, Evaluators, Decision Making
Verhavert, San; Bouwer, Renske; Donche, Vincent; De Maeyer, Sven – Assessment in Education: Principles, Policy & Practice, 2019
Comparative Judgement (CJ) aims to improve the quality of performance-based assessments by letting multiple assessors judge pairs of performances. CJ is generally associated with high levels of reliability, but there is also a large variation in reliability between assessments. This study investigates which assessment characteristics influence the…
Descriptors: Meta Analysis, Reliability, Comparative Analysis, Value Judgment
Sims, Maureen E.; Cox, Troy L.; Eckstein, Grant T.; Hartshorn, K. James; Wilcox, Matthew P.; Hart, Judson M. – Educational Measurement: Issues and Practice, 2020
The purpose of this study is to explore the reliability of a potentially more practical approach to direct writing assessment in the context of ESL writing. Traditional rubric rating (RR) is a common yet resource-intensive evaluation practice when performed reliably. This study compared the traditional rubric model of ESL writing assessment and…
Descriptors: Scoring Rubrics, Item Response Theory, Second Language Learning, English (Second Language)
Benton, Tom; Leech, Tony; Hughes, Sarah – Cambridge Assessment, 2020
In the context of examinations, the phrase "maintaining standards" usually refers to any activity designed to ensure that it is no easier (or harder) to achieve a given grade in one year than in another. Specifically, it tends to mean activities associated with setting examination grade boundaries. Benton et al (2020) describes a method…
Descriptors: Mathematics Tests, Equated Scores, Comparative Analysis, Difficulty Level
Deribo, Tobias; Goldhammer, Frank; Kroehne, Ulf – Educational and Psychological Measurement, 2023
As researchers in the social sciences, we are often interested in studying not directly observable constructs through assessments and questionnaires. But even in a well-designed and well-implemented study, rapid-guessing behavior may occur. Under rapid-guessing behavior, a task is skimmed shortly but not read and engaged with in-depth. Hence, a…
Descriptors: Reaction Time, Guessing (Tests), Behavior Patterns, Bias
Butti, Niccolò; Finisguerra, Alessandra; Urgesi, Cosimo – Developmental Psychology, 2022
There is inconsistent evidence that human bodies are processed through holistic processing as it has been widely reported for faces. To assess how configural and holistic processes may develop with age, we administered a visual body recognition task assessing the presence of body inversion and composite illusion effects to white adults (114…
Descriptors: Human Body, Whites, Adults, Holistic Approach
Azman Ong, Mohd Hanafi; Mohd Yasin, Norazlina; Ibrahim, Nur Syafikah – Asian Association of Open Universities Journal, 2022
Purpose: Measuring internal response of online learning is seen as fundamental to absorptive capacity which stimulates knowledge assimilation. However, the evaluation of practice and research of validated instruments that could effectively measure online learning response behavior is limited. Thus, in this study, a new instrument was designed…
Descriptors: Online Courses, Student Surveys, Student Attitudes, Factor Analysis
Yildirim, Osman Gazi; Ozdener, Nesrin – International Journal of Computer Science Education in Schools, 2022
The main goal of the current study is to develop a reliable instrument to measure programming anxiety in university students. A pool of 33 items based on extensive literature review and experts' opinions were created by researchers. The draft scale comprised three factors applied to 392 university students from two different universities in Turkey…
Descriptors: Anxiety, Undergraduate Students, Student Attitudes, Factor Analysis
Re-Imagining Narrative Writing and Assessment: A Post-NAPLAN Craft-Based Rubric for Creative Writing
Michael D. Carey; Shelley Davidow; Paul Williams – Australian Journal of Language and Literacy, 2022
According to creative writing pedagogies academic Susanne Gannon ("English in Australia, 54"(2), 43-56, 2019), and the Federal government-commissioned NAPLAN review (McGaw et al., 2020), NAPLAN has restricted how writing is taught in secondary schools. A NAPLAN-influenced structural approach to teaching writing has subsumed the…
Descriptors: Scoring Rubrics, Creative Writing, Writing Evaluation, National Competency Tests
Aleyna Altan; Zehra Taspinar Sener – Online Submission, 2023
This research aimed to develop a valid and reliable test to be used to detect sixth grade students' misconceptions and errors regarding the subject of fractions. A misconception diagnostic test has been developed that includes the concept of fractions, different representations of fractions, ordering and comparing fractions, equivalence of…
Descriptors: Diagnostic Tests, Mathematics Tests, Fractions, Misconceptions

Peer reviewed
Direct link
