NotesFAQContact Us
Collection
Advanced
Search Tips
Showing 1 to 15 of 10,088 results Save | Export
Peer reviewed Peer reviewed
Direct linkDirect link
Mingfeng Xue; Yunting Liu; Xingyao Xiao; Mark Wilson – Journal of Educational Measurement, 2025
Prompts play a crucial role in eliciting accurate outputs from large language models (LLMs). This study examines the effectiveness of an automatic prompt engineering (APE) framework for automatic scoring in educational measurement. We collected constructed-response data from 930 students across 11 items and used human scores as the true labels. A…
Descriptors: Computer Assisted Testing, Prompting, Educational Assessment, Automation
Jiayi Deng – ProQuest LLC, 2024
Test score comparability in international large-scale assessments (LSA) is of utmost importance in measuring the effectiveness of education systems and understanding the impact of education on economic growth. To effectively compare test scores on an international scale, score linking is widely used to convert raw scores from different linguistic…
Descriptors: Item Response Theory, Scoring Rubrics, Scoring, Error of Measurement
Peer reviewed Peer reviewed
Direct linkDirect link
Louise Badham – Oxford Review of Education, 2025
Different sources of assessment evidence are reviewed during International Baccalaureate (IB) grade awarding to convert marks into grades and ensure fair results for students. Qualitative and quantitative evidence are analysed to determine grade boundaries, with statistical evidence weighed against examiner judgement and teachers' feedback on…
Descriptors: Advanced Placement Programs, Grading, Interrater Reliability, Evaluative Thinking
Peer reviewed Peer reviewed
Direct linkDirect link
Andreea Dutulescu; Stefan Ruseti; Mihai Dascalu; Danielle S. McNamara – Grantee Submission, 2025
The assessment of student responses to learning-strategy prompts, such as self-explanation, summarization, and paraphrasing, is essential for evaluating cognitive engagement and comprehension. However, manual scoring is resource-intensive, limiting its scalability in educational settings. This study investigates the use of Large Language Models…
Descriptors: Scoring, Computational Linguistics, Computer Software, Artificial Intelligence
Peer reviewed Peer reviewed
PDF on ERIC Download full text
Andreea Dutulescu; Stefan Ruseti; Mihai Dascalu; Danielle McNamara – International Educational Data Mining Society, 2025
The assessment of student responses to learning-strategy prompts, such as self-explanation, summarization, and paraphrasing, is essential for evaluating cognitive engagement and comprehension. However, manual scoring is resource-intensive, limiting its scalability in educational settings. This study investigates the use of Large Language Models…
Descriptors: Scoring, Computational Linguistics, Computer Software, Artificial Intelligence
Peer reviewed Peer reviewed
Direct linkDirect link
Paul Leeming; Justin Harris – Language Teaching Research, 2025
Measurement of language learners' development in speaking proficiency is important for practicing language teachers, not only for assessment purposes, but also for evaluating the effectiveness of materials and approaches used. However, doing so effectively and efficiently presents challenges. Commercial speaking tests are often costly, and beyond…
Descriptors: English (Second Language), Second Language Learning, Second Language Instruction, College Students
Peer reviewed Peer reviewed
Direct linkDirect link
Jae-Sang Han; Hyun-Joo Kim – Journal of Science Education and Technology, 2025
This study explores the potential to enhance the performance of convolutional neural networks (CNNs) for automated scoring of kinematic graph answers through data augmentation using Deep Convolutional Generative Adversarial Networks (DCGANs). By developing and fine-tuning a DCGAN model to generate high-quality graph images, we explored its…
Descriptors: Performance, Automation, Scoring, Models
Peer reviewed Peer reviewed
PDF on ERIC Download full text
Naima Debbar – International Journal of Contemporary Educational Research, 2024
Intelligent systems of essay grading constitute important tools for educational technologies. They can significantly replace the manual scoring efforts and provide instructional feedback as well. These systems typically include two main parts: a feature extractor and an automatic grading model. The latter is generally based on computational and…
Descriptors: Test Scoring Machines, Computer Uses in Education, Artificial Intelligence, Essay Tests
Peer reviewed Peer reviewed
Direct linkDirect link
John R. Donoghue; Carol Eckerly – Applied Measurement in Education, 2024
Trend scoring constructed response items (i.e. rescoring Time A responses at Time B) gives rise to two-way data that follow a product multinomial distribution rather than the multinomial distribution that is usually assumed. Recent work has shown that the difference in sampling model can have profound negative effects on statistics usually used to…
Descriptors: Scoring, Error of Measurement, Reliability, Scoring Rubrics
Saenz, David Arron – Online Submission, 2023
There is a vast body of literature documenting the positive impacts that rater training and calibration sessions have on inter-rater reliability as research indicates several factors including frequency and timing play crucial roles towards ensuring inter-rater reliability. Additionally, increasing amounts research indicate possible links in…
Descriptors: Interrater Reliability, Scoring, Training, Scoring Rubrics
Peer reviewed Peer reviewed
PDF on ERIC Download full text
Venessa F. Manna; Shuhong Li; Spiros Papageorgiou; Lixiong Gu – ETS Research Report Series, 2025
This technical manual describes the purpose and intended uses of the TOEFL iBT test, its target test-taker population, and relevant language use domains. The test design and scoring procedures are presented first, followed by a research agenda intended to support the interpretation and use of test scores. Given the updates to the test starting…
Descriptors: Second Language Learning, English (Second Language), Language Tests, Test Construction
Peer reviewed Peer reviewed
Direct linkDirect link
Ikkyu Choi; Matthew S. Johnson – Journal of Educational Measurement, 2025
Automated scoring systems provide multiple benefits but also pose challenges, notably potential bias. Various methods exist to evaluate these algorithms and their outputs for bias. Upon detecting bias, the next logical step is to investigate its cause, often by examining feature distributions. Recently, Johnson and McCaffrey proposed an…
Descriptors: Prediction, Bias, Automation, Scoring
Peer reviewed Peer reviewed
Direct linkDirect link
Keshav Panray Jungbadoor; Xi Hong; Liu Liu; Yunan Zhu; Xinni Huang; Viraiyan Teeroovengadum; Gwilym Croucher; Angel Calderon; Sara Bice; Hamish Coates – Tertiary Education and Management, 2025
This paper reports on a multiyear program of international collaborative research delivered with the aim of conceptualising, validating and prototyping rubrics for evaluating and reporting university activities and outcomes relevant to the UN SDGs. The paper sets foundations by building on earlier analysis of research on university engagement with…
Descriptors: Higher Education, Universities, Sustainable Development, Scoring Rubrics
Peer reviewed Peer reviewed
Direct linkDirect link
David DiSabito; Lisa Hansen; Thomas Mennella; Josephine Rodriguez – New Directions for Teaching and Learning, 2025
This chapter investigates the integration of generative AI (GenAI), specifically ChatGPT, into institutional and course-level assessment at Western New England University. It explores the potential of GenAI to streamline the assessment process, making it more efficient, equitable, and objective. Through the development of a proprietary GenAI tool,…
Descriptors: Artificial Intelligence, Technology Uses in Education, Man Machine Systems, Educational Assessment
Peer reviewed Peer reviewed
Direct linkDirect link
Hui Jin; Cynthia Lima; Limin Wang – Educational Measurement: Issues and Practice, 2025
Although AI transformer models have demonstrated notable capability in automated scoring, it is difficult to examine how and why these models fall short in scoring some responses. This study investigated how transformer models' language processing and quantification processes can be leveraged to enhance the accuracy of automated scoring. Automated…
Descriptors: Automation, Scoring, Artificial Intelligence, Accuracy
Previous Page | Next Page »
Pages: 1  |  2  |  3  |  4  |  5  |  6  |  7  |  8  |  9  |  10  |  11  |  ...  |  673