NotesFAQContact Us
Collection
Advanced
Search Tips
What Works Clearinghouse Rating
Showing 1 to 15 of 132 results Save | Export
Peer reviewed Peer reviewed
PDF on ERIC Download full text
Jing Miao; Sandip Sinharay; Chris Kelbaugh; Yi Cao; Wei Wang – ETS Research Report Series, 2023
In a targeted double-scoring procedure for performance assessments that are used for licensure and certification purposes, a subset of responses receives an independent second rating if the first rating falls into a preidentified critical score range (CSR) where an additional rating would lead to considerably more reliable pass-fail decisions.…
Descriptors: Scoring, Performance Based Assessment, Licensing Examinations (Professions), Certification
Joshua B. Gilbert; James G. Soland; Benjamin W. Domingue – Annenberg Institute for School Reform at Brown University, 2025
Value-Added Models (VAMs) are both common and controversial in education policy and accountability research. While the sensitivity of VAMs to model specification and covariate selection is well documented, the extent to which test scoring methods (e.g., mean scores vs. IRT-based scores) may affect VA estimates is less studied. We examine the…
Descriptors: Value Added Models, Tests, Testing, Scoring
Peer reviewed Peer reviewed
Direct linkDirect link
Wallace N. Pinto Jr.; Jinnie Shin – Journal of Educational Measurement, 2025
In recent years, the application of explainability techniques to automated essay scoring and automated short-answer grading (ASAG) models, particularly those based on transformer architectures, has gained significant attention. However, the reliability and consistency of these techniques remain underexplored. This study systematically investigates…
Descriptors: Automation, Grading, Computer Assisted Testing, Scoring
Peer reviewed Peer reviewed
PDF on ERIC Download full text
Beheshti, Shima; Safa, Mohammad Ahmadi – Iranian Journal of Language Teaching Research, 2023
The indefinite nature of test fairness and different interpretations and definitions of the concept have stirred a lot of controversy over the years, necessitating the reconceptualization of the concept. On this basis, this study aimed to explore the empirical validity of Kunnan's (2008) Test Fairness Framework (TFF) and revisit the established…
Descriptors: Test Bias, Equal Education, Grounded Theory, Test Construction
Peer reviewed Peer reviewed
Direct linkDirect link
Marit Skarbø Solem; Anne Marie Dalby Landmark; Elizabeth Stokoe; Karianne Skovholt – Scandinavian Journal of Educational Research, 2024
How do examiners reach joint decisions when they grade oral examinations? While government and policymakers provide general frameworks about grading decisions, we know little about how they are actually accomplished in interaction, particularly when examiners initially disagree. We scrutinized 29 video-recorded grading conversations between…
Descriptors: Foreign Countries, Secondary School Teachers, Secondary Education, Speech Tests
Peer reviewed Peer reviewed
PDF on ERIC Download full text
Alyson Burnett; Katlyn Lee Milless; Michelle Bennett; Whitney Kozakowski; Sonia Alves; Christine Ross – Regional Educational Laboratory Mid-Atlantic, 2024
This study analyzed Pennsylvania School Climate Survey data from students and staff in the 2021/22 school year to assess the validity and reliability of the elementary school student version of the survey; approaches to scoring the survey in individual schools at all grade levels; and perceptions of school climate across student, staff, and school…
Descriptors: Educational Environment, Decision Making, Surveys, Validity
Peer reviewed Peer reviewed
PDF on ERIC Download full text
Makiko Kato – Journal of Education and Learning, 2025
This study aims to examine whether differences exist in the factors influencing the difficulty of scoring English summaries and determining scores based on the raters' attributes, and to collect candid opinions, considerations, and tentative suggestions for future improvements to the analytic rubric of summary writing for English learners. In this…
Descriptors: Writing Evaluation, Scoring, Writing Skills, English (Second Language)
Peer reviewed Peer reviewed
Direct linkDirect link
Kunal Sareen – Innovations in Education and Teaching International, 2024
This study examines the proficiency of Chat GPT, an AI language model, in answering questions on the Situational Judgement Test (SJT), a widely used assessment tool for evaluating the fundamental competencies of medical graduates in the UK. A total of 252 SJT questions from the "Oxford Assess and Progress: Situational Judgement" Test…
Descriptors: Ethics, Decision Making, Artificial Intelligence, Computer Software
Peer reviewed Peer reviewed
Direct linkDirect link
Rebecca Sickinger; Tineke Brunfaut; John Pill – Language Testing, 2025
Comparative Judgement (CJ) is an evaluation method, typically conducted online, whereby a rank order is constructed, and scores calculated, from judges' pairwise comparisons of performances. CJ has been researched in various educational contexts, though only rarely in English as a Foreign Language (EFL) writing settings, and is generally agreed to…
Descriptors: Writing Evaluation, English (Second Language), Second Language Learning, Second Language Instruction
Peer reviewed Peer reviewed
PDF on ERIC Download full text
Regional Educational Laboratory Mid-Atlantic, 2024
These are the appendixes for the report, "Strengthening the Pennsylvania School Climate Survey to Inform School Decisionmaking." This study analyzed Pennsylvania School Climate Survey data from students and staff in the 2021/22 school year to assess the validity and reliability of the elementary school student version of the survey;…
Descriptors: Educational Environment, Surveys, Decision Making, School Personnel
Peer reviewed Peer reviewed
Direct linkDirect link
Oudman, Sophie; van de Pol, Janneke; van Gog, Tamara – Metacognition and Learning, 2022
Preparing students to become self-regulated learners has become an important goal of primary education. Therefore, it is important to investigate how we can improve self-monitoring and self-regulation accuracy in primary school students. Focusing on mathematics problems, we investigated whether and how (1) high- and low-performing students…
Descriptors: Metacognition, Elementary School Students, Mathematics Instruction, Problem Solving
Peer reviewed Peer reviewed
Direct linkDirect link
Yukhymenko-Lescroart, Mariya A.; Goldman, Susan R.; Lawless, Kimberly A.; Pellegrino, James W.; Shanahan, Cynthia R. – Educational Psychology, 2022
To extend the existing research examining multiple text comprehension and its assessment, we developed a verification task approach to assessing of information that was "explicitly" and "implicitly" presented "within" and across nine texts. A nonparametric form of signal detection theory was used to analyse the…
Descriptors: Task Analysis, Reading Comprehension, Middle School Students, Nonparametric Statistics
Peer reviewed Peer reviewed
PDF on ERIC Download full text
Congning Ni; Bhashithe Abeysinghe; Juanita Hicks – International Electronic Journal of Elementary Education, 2025
The National Assessment of Educational Progress (NAEP), often referred to as The Nation's Report Card, offers a window into the state of U.S. K-12 education system. Since 2017, NAEP has transitioned to digital assessments, opening new research opportunities that were previously impossible. Process data tracks students' interactions with the…
Descriptors: Reaction Time, Multiple Choice Tests, Behavior Change, National Competency Tests
Peer reviewed Peer reviewed
Direct linkDirect link
Soland, James; Kuhfeld, Megan; Register, Brennan – Educational Assessment, 2023
Much of what we know about how children develop is based on survey data. In order to estimate growth across time and, thereby, better understand that development, short survey scales are typically administered at repeated timepoints. Before estimating growth, those repeated measures must be put onto the same scale. Yet, little research examines…
Descriptors: Comparative Analysis, Social Emotional Learning, Scaling, Effect Size
Peer reviewed Peer reviewed
Direct linkDirect link
Peng, Yue; Yan, Wei; Cheng, Liying – Language Testing, 2021
This test review focuses on the current version (2009) of [Chinese characters omitted] (Hanyu Shuiping Kaoshi), literally translated as the Chinese Language Proficiency Test and abbreviated as HSK. Tailored to non-native speakers of the Chinese language, this test consists of six proficiency levels (Levels 1 and 2 as beginners, Levels 3 and 4 as…
Descriptors: Language Proficiency, Language Tests, Chinese, Decision Making
Previous Page | Next Page »
Pages: 1  |  2  |  3  |  4  |  5  |  6  |  7  |  8  |  9