Publication Date
In 2025 | 2 |
Since 2024 | 4 |
Since 2021 (last 5 years) | 11 |
Since 2016 (last 10 years) | 21 |
Since 2006 (last 20 years) | 44 |
Descriptor
Accuracy | 44 |
Evaluation Methods | 44 |
Validity | 25 |
Test Validity | 17 |
Foreign Countries | 10 |
Reliability | 10 |
Test Reliability | 10 |
Models | 7 |
Rating Scales | 7 |
Comparative Analysis | 6 |
Measurement Techniques | 6 |
More ▼ |
Source
Author
Al Hajri, Fatma | 1 |
Amery D. Wu | 1 |
Angus, Megan Hague | 1 |
Apple, Kristen | 1 |
Atilla Ergin | 1 |
Bejar, Isaac I. | 1 |
Bell, Courtney A. | 1 |
Benjamin, Rebekah George | 1 |
Berstein Ratner, Nan | 1 |
Bush, Paula | 1 |
Chafouleas, Sandra M. | 1 |
More ▼ |
Publication Type
Journal Articles | 34 |
Reports - Research | 33 |
Tests/Questionnaires | 6 |
Reports - Descriptive | 5 |
Dissertations/Theses -… | 3 |
Reports - Evaluative | 3 |
Information Analyses | 2 |
Speeches/Meeting Papers | 2 |
Opinion Papers | 1 |
Education Level
Audience
Policymakers | 2 |
Practitioners | 1 |
Location
Thailand | 2 |
Australia | 1 |
California (Los Angeles) | 1 |
China | 1 |
Colorado (Denver) | 1 |
Connecticut | 1 |
Delaware | 1 |
District of Columbia | 1 |
Florida | 1 |
Germany | 1 |
Indiana | 1 |
More ▼ |
Laws, Policies, & Programs
Assessments and Surveys
Dynamic Indicators of Basic… | 1 |
Test of English as a Foreign… | 1 |
What Works Clearinghouse Rating
Peer Overmarking and Insufficient Diagnosticity: The Impact of the Rating Method for Peer Assessment
Van Meenen, Florence; Coertjens, Liesje; Van Nes, Marie-Claire; Verschuren, Franck – Advances in Health Sciences Education, 2022
The present study explores two rating methods for peer assessment (analytical rating using criteria and comparative judgement) in light of concurrent validity, reliability and insufficient diagnosticity (i.e. the degree to which substandard work is recognised by the peer raters). During a second-year undergraduate course, students wrote a one-page…
Descriptors: Evaluation Methods, Peer Evaluation, Accuracy, Evaluation Criteria
Kylie Gorney; Sandip Sinharay – Journal of Educational Measurement, 2025
Although there exists an extensive amount of research on subscores and their properties, limited research has been conducted on categorical subscores and their interpretations. In this paper, we focus on the claim of Feinberg and von Davier that categorical subscores are useful for remediation and instructional purposes. We investigate this claim…
Descriptors: Tests, Scores, Test Interpretation, Alternative Assessment
Shun-Fu Hu; Amery D. Wu; Jake Stone – Journal of Educational Measurement, 2025
Scoring high-dimensional assessments (e.g., > 15 traits) can be a challenging task. This paper introduces the multilabel neural network (MNN) as a scoring method for high-dimensional assessments. Additionally, it demonstrates how MNN can score the same test responses to maximize different performance metrics, such as accuracy, recall, or…
Descriptors: Tests, Testing, Scores, Test Construction
Nie, Yanjiao; Luo, Heng; Sun, Di – Interactive Learning Environments, 2021
The proliferation of massive open online courses (MOOCs) highlights the necessity of developing accurate and diagnostic evaluation methods to assess the courses' quality and effectiveness. Hence, this study proposes a diagnostic MOOC evaluation (DME) method that combines the Analytic Hierarchy Process algorithm and learner review mining to…
Descriptors: Online Courses, Evaluation Methods, Course Evaluation, Mathematics
Francois, Isabelle; Lapka, Stefanie; Berstein Ratner, Nan; Mills, Monique T. – EBP Briefs (Evidence-based Practice Briefs), 2023
Clinical Question: For young AAE speakers (P), how useful is the Developmental Sentence Scoring (DSS) compared with Index of Productive Syntax (IPSyn) in identifying developmental language disorder (DLD) in the presence of African American English (AAE)? Method: Structured Review. Study Sources: PsycInfo®, Education Source, Education Resources…
Descriptors: Black Dialects, Language Impairments, Developmental Delays, Syntax
Atilla Ergin; Yelkin Diker Coskun – International Journal on Social and Education Sciences, 2024
This study aims to develop a scale to measure the design thinking process and to evaluate the reliability and validity of this scale. It fills this gap by introducing a 36-item scale specifically designed to measure design thinking abilities across the five key stages of the design thinking process: empathize, define, ideate, prototype, and test,…
Descriptors: Design, Thinking Skills, Likert Scales, Empathy
Guler, Gul; Cikrikci, Rahime Nukhet – International Journal of Assessment Tools in Education, 2022
The purpose of this study was to investigate the Type I Error findings and power rates of the methods used to determine dimensionality in unidimensional and bidimensional psychological constructs for various conditions (characteristic of the distribution, sample size, length of the test, and interdimensional correlation) and to examine the joint…
Descriptors: Comparative Analysis, Error of Measurement, Decision Making, Factor Analysis
Elaine Chapman; Jian Zhao; Peyman G. P. Sabet – Education Research and Perspectives, 2024
Effective assessments guide student learning, refine teaching practices, ensure curriculum alignment, and foster workforce readiness. However, the emergence of generative artificial intelligence (GenAI) tools, such as ChatGPT, has significantly disrupted traditional assessment processes, raising concerns about academic integrity and necessitating…
Descriptors: Artificial Intelligence, Evaluation Methods, Influence of Technology, Integrity
Thapelo Ncube Whitfield – ProQuest LLC, 2021
Student Experience surveys are used to measure student attitudes towards their campus as well as to initiate conversations for institutional change. Validity evidence to support the interpretations of these surveys' results, however, is lacking. The first purpose of this study was to compare three Differential Item Functioning (DIF) methods on…
Descriptors: College Students, Student Surveys, Student Experience, Student Attitudes
Dalton, Sarah Grace; Stark, Brielle C.; Fromm, Davida; Apple, Kristen; MacWhinney, Brian; Rensch, Amanda; Rowedder, Madyson – Journal of Speech, Language, and Hearing Research, 2022
Purpose: The aim of this study was to advance the use of structured, monologic discourse analysis by validating an automated scoring procedure for core lexicon (CoreLex) using transcripts. Method: Forty-nine transcripts from persons with aphasia and 48 transcripts from persons with no brain injury were retrieved from the AphasiaBank database. Five…
Descriptors: Validity, Discourse Analysis, Databases, Scoring
Huang, Xiaoping; Hu, Zhongfeng – Higher Education Studies, 2015
The main problem of the educational evaluation validity is that it just copies the conceptual framework system of validity from educational measurement to its own conceptual system. The validity conceptual system that fits the need of theory and practice of educational evaluation has not been established yet. According to the inherent attributive…
Descriptors: Test Validity, Educational Assessment, Evaluation Problems, Theory Practice Relationship
Bell, Courtney A.; Jones, Nathan D.; Qi, Yi; Lewis, Jennifer M. – Educational Assessment, 2018
All 50 states use observations to evaluate practicing teachers, but we know little about how administrators actually reason when they use those observation protocols. Drawing on think-aloud and stimulated recall data, this study describes the types of strategies and warrants practicing administrators used when rating with their district's…
Descriptors: Administrators, Observation, Validity, Logical Thinking
Khamboonruang, Apichat – rEFLections, 2022
Although much research has compared the functioning between analytic and holistic rating scales, little research has compared the functioning of binary rating scales with other types of rating scales. This quantitative study set out to preliminarily and comparatively validate binary and analytic rating scales intended for use in formative…
Descriptors: Writing Evaluation, Evaluation Methods, Second Language Learning, Second Language Instruction
National Centre for Vocational Education Research (NCVER), 2016
This work asks one simple question: "how reliable is the method used by the National Centre for Vocational Education Research (NCVER) to estimate projected rates of VET program completion?" In other words, how well do early projections align with actual completion rates some years later? Completion rates are simple to calculate with a…
Descriptors: Vocational Education, Graduation Rate, Predictive Measurement, Predictive Validity
Hamzeh, Joshua; Kaur, Navdeep; Bush, Paula; Hudon, Catherine; Schuster, Tibor; Vedel, Isabelle; Hong, Quan Nha; Pluye, Pierre – Education for Information, 2019
The questionnaires' origin (sources from which elements of the questionnaire are derived) and initial development (process of making a questionnaire from elements) should be assessed before their measurement properties. There is no Critical Appraisal Tool (CAT) that comprehensively assesses the origin and initial development of questionnaires…
Descriptors: Questionnaires, Information Services, Users (Information), Academic Libraries