NotesFAQContact Us
Collection
Advanced
Search Tips
Showing 16 to 30 of 9,980 results Save | Export
Peer reviewed Peer reviewed
Direct linkDirect link
Burkhardt, Amy; Lottridge, Susan; Woolf, Sherri – Educational Measurement: Issues and Practice, 2021
For some students, standardized tests serve as a conduit to disclose sensitive issues of harm or distress that may otherwise go unreported. By detecting this writing, known as "crisis papers," testing programs have a unique opportunity to assist in mitigating the risk of harm to these students. The use of machine learning to…
Descriptors: Scoring Rubrics, Identification, At Risk Students, Standardized Tests
Peer reviewed Peer reviewed
Direct linkDirect link
Olaghere, Ajima; Wilson, David B.; Kimbrell, Catherine – Research Synthesis Methods, 2023
A diversity of approaches for critically appraising qualitative and quantitative evidence exist and emphasize different aspects. These approaches lack clear processes to facilitate rating the overall quality of the evidence for aggregated findings that combine qualitative and quantitative evidence. We draw on a meta-aggregation of implementation…
Descriptors: Evidence, Synthesis, Scoring Rubrics, Standardized Tests
Peer reviewed Peer reviewed
Direct linkDirect link
Rodgers, Emily; D'Agostino, Jerome V.; Berenbon, Rebecca; Johnson, Tracy; Winkler, Christa – Journal of Early Childhood Literacy, 2023
Running Records are thought to be an excellent formative assessment tool because they generate results that educators can use to make their teaching more responsive. Despite the technical nature of scoring Running Records and the kinds of important decisions that are attached to their analysis, few studies have investigated assessor accuracy. We…
Descriptors: Formative Evaluation, Scoring, Accuracy, Difficulty Level
Yun-Kyung Kim; Li Cai – National Center for Research on Evaluation, Standards, and Student Testing (CRESST), 2025
This paper introduces an application of cross-classified item response theory (IRT) modeling to an assessment utilizing the embedded standard setting (ESS) method (Lewis & Cook). The cross-classified IRT model is used to treat both item and person effects as random, where the item effects are regressed on the target performance levels (target…
Descriptors: Standard Setting (Scoring), Item Response Theory, Test Items, Difficulty Level
Peer reviewed Peer reviewed
Direct linkDirect link
Glory Tobiason; Adrienne Lavine – Change: The Magazine of Higher Learning, 2025
Current methods for evaluating faculty teaching fall short, and one way to address this is through campus-wide initiatives that focus on change at the level of academic units. The complex context of higher education makes meaningful teaching evaluation difficult; in particular, four sobering realities of this context must be taken into account in…
Descriptors: Teacher Evaluation, Evaluation Methods, Testing Problems, Educational Change
Peer reviewed Peer reviewed
Direct linkDirect link
Roduta Roberts, Mary; Gotch, Chad M.; Cook, Megan; Werther, Karin; Chao, Iris C. I. – Measurement: Interdisciplinary Research and Perspectives, 2022
Performance-based assessment is a common approach to assess the development and acquisition of practice competencies among health professions students. Judgments related to the quality of performance are typically operationalized as ratings against success criteria specified within a rubric. The extent to which the rubric is understood,…
Descriptors: Protocol Analysis, Scoring Rubrics, Interviews, Performance Based Assessment
Peer reviewed Peer reviewed
PDF on ERIC Download full text
Doewes, Afrizal; Pechenizkiy, Mykola – International Educational Data Mining Society, 2021
Scoring essays is generally an exhausting and time-consuming task for teachers. Automated Essay Scoring (AES) facilitates the scoring process to be faster and more consistent. The most logical way to assess the performance of an automated scorer is by measuring the score agreement with the human raters. However, we provide empirical evidence that…
Descriptors: Man Machine Systems, Automation, Computer Assisted Testing, Scoring
Peer reviewed Peer reviewed
Direct linkDirect link
Bamdev, Pakhi; Grover, Manraj Singh; Singla, Yaman Kumar; Vafaee, Payman; Hama, Mika; Shah, Rajiv Ratn – International Journal of Artificial Intelligence in Education, 2023
English proficiency assessments have become a necessary metric for filtering and selecting prospective candidates for both academia and industry. With the rise in demand for such assessments, it has become increasingly necessary to have the automated human-interpretable results to prevent inconsistencies and ensure meaningful feedback to the…
Descriptors: Language Proficiency, Automation, Scoring, Speech Tests
Peer reviewed Peer reviewed
Direct linkDirect link
Culpepper, Dawn; White-Lewis, Damani; O'Meara, KerryAnn; Templeton, Lindsey; Anderson, Julia – Journal of Higher Education, 2023
Many colleges and universities now require faculty search committees to use rubrics when evaluating faculty job candidates, as proponents believe these "decision-support tools" can reduce the impact of bias in candidate evaluation. That is, rubrics are intended to ensure that candidates are evaluated more fairly, which is then thought to…
Descriptors: Scoring Rubrics, Bias, Personnel Selection, College Faculty
Peer reviewed Peer reviewed
Direct linkDirect link
Meryssa Piper; Jessica Frankle; Sophia Owens; Blake Stubbins; Lancen Tully; Katherine Ryker – Journal of Geoscience Education, 2025
Rock and mineral labs are fundamental in traditional introductory geology courses. Successful implementation of these lab activities provides students opportunities to apply content knowledge. Inquiry-based instruction may be one way to increase student success. Prior examination of published STEM labs indicates that geology labs, particularly…
Descriptors: Geology, Introductory Courses, Laboratory Experiments, Inquiry
Peer reviewed Peer reviewed
Direct linkDirect link
Yangmeng Xu; Stefanie A. Wind – Educational Measurement: Issues and Practice, 2025
Double-scoring constructed-response items is a common but costly practice in mixed-format assessments. This study explored the impacts of Targeted Double-Scoring (TDS) and random double-scoring procedures on the quality of psychometric outcomes, including student achievement estimates, person fit, and student classifications under various…
Descriptors: Academic Achievement, Psychometrics, Scoring, Evaluation Methods
Peer reviewed Peer reviewed
Direct linkDirect link
Alexandra Jackson; Cheryl Bodnar; Elise Barrella; Juan Cruz; Krista Kecskemety – Journal of STEM Education: Innovations and Research, 2025
Recent curricular interventions in engineering education have focused on encouraging students to develop an entrepreneurial mindset (EM) to equip them with the skills needed to generate innovative ideas and address complex global problems upon entering the workforce. Methods to evaluate these interventions have been inconsistent due to the lack of…
Descriptors: Engineering Education, Entrepreneurship, Concept Mapping, Student Evaluation
Peer reviewed Peer reviewed
PDF on ERIC Download full text
Steven Holtzman; Jonathan Steinberg; Jonathan Weeks; Christopher Robertson; Jessica Findley; David Klieger – ETS Research Report Series, 2024
At a time when institutions of higher education are exploring alternatives to traditional admissions testing, institutions are also seeking to better support students and prepare them for academic success. Under such an engaged model, one may seek to measure not just the accumulated knowledge and skills that students would bring to a new academic…
Descriptors: Law Schools, College Applicants, Legal Education (Professions), College Entrance Examinations
Peer reviewed Peer reviewed
Direct linkDirect link
Stefanie A. Wind; Yuan Ge – Measurement: Interdisciplinary Research and Perspectives, 2024
Mixed-format assessments made up of multiple-choice (MC) items and constructed response (CR) items that are scored using rater judgments include unique psychometric considerations. When these item types are combined to estimate examinee achievement, information about the psychometric quality of each component can depend on that of the other. For…
Descriptors: Interrater Reliability, Test Bias, Multiple Choice Tests, Responses
Peer reviewed Peer reviewed
Direct linkDirect link
Benjamin Goecke; Paul V. DiStefano; Wolfgang Aschauer; Kurt Haim; Roger Beaty; Boris Forthmann – Journal of Creative Behavior, 2024
Automated scoring is a current hot topic in creativity research. However, most research has focused on the English language and popular verbal creative thinking tasks, such as the alternate uses task. Therefore, in this study, we present a large language model approach for automated scoring of a scientific creative thinking task that assesses…
Descriptors: Creativity, Creative Thinking, Scoring, Automation
Pages: 1  |  2  |  3  |  4  |  5  |  6  |  7  |  8  |  9  |  10  |  11  |  ...  |  666