NotesFAQContact Us
Collection
Advanced
Search Tips
Showing 1 to 15 of 10,081 results Save | Export
Jiayi Deng – ProQuest LLC, 2024
Test score comparability in international large-scale assessments (LSA) is of utmost importance in measuring the effectiveness of education systems and understanding the impact of education on economic growth. To effectively compare test scores on an international scale, score linking is widely used to convert raw scores from different linguistic…
Descriptors: Item Response Theory, Scoring Rubrics, Scoring, Error of Measurement
Peer reviewed Peer reviewed
Direct linkDirect link
Louise Badham – Oxford Review of Education, 2025
Different sources of assessment evidence are reviewed during International Baccalaureate (IB) grade awarding to convert marks into grades and ensure fair results for students. Qualitative and quantitative evidence are analysed to determine grade boundaries, with statistical evidence weighed against examiner judgement and teachers' feedback on…
Descriptors: Advanced Placement Programs, Grading, Interrater Reliability, Evaluative Thinking
Peer reviewed Peer reviewed
Direct linkDirect link
Andreea Dutulescu; Stefan Ruseti; Mihai Dascalu; Danielle S. McNamara – Grantee Submission, 2025
The assessment of student responses to learning-strategy prompts, such as self-explanation, summarization, and paraphrasing, is essential for evaluating cognitive engagement and comprehension. However, manual scoring is resource-intensive, limiting its scalability in educational settings. This study investigates the use of Large Language Models…
Descriptors: Scoring, Computational Linguistics, Computer Software, Artificial Intelligence
Peer reviewed Peer reviewed
PDF on ERIC Download full text
Andreea Dutulescu; Stefan Ruseti; Mihai Dascalu; Danielle McNamara – International Educational Data Mining Society, 2025
The assessment of student responses to learning-strategy prompts, such as self-explanation, summarization, and paraphrasing, is essential for evaluating cognitive engagement and comprehension. However, manual scoring is resource-intensive, limiting its scalability in educational settings. This study investigates the use of Large Language Models…
Descriptors: Scoring, Computational Linguistics, Computer Software, Artificial Intelligence
Peer reviewed Peer reviewed
Direct linkDirect link
Paul Leeming; Justin Harris – Language Teaching Research, 2025
Measurement of language learners' development in speaking proficiency is important for practicing language teachers, not only for assessment purposes, but also for evaluating the effectiveness of materials and approaches used. However, doing so effectively and efficiently presents challenges. Commercial speaking tests are often costly, and beyond…
Descriptors: English (Second Language), Second Language Learning, Second Language Instruction, College Students
Peer reviewed Peer reviewed
Direct linkDirect link
Jae-Sang Han; Hyun-Joo Kim – Journal of Science Education and Technology, 2025
This study explores the potential to enhance the performance of convolutional neural networks (CNNs) for automated scoring of kinematic graph answers through data augmentation using Deep Convolutional Generative Adversarial Networks (DCGANs). By developing and fine-tuning a DCGAN model to generate high-quality graph images, we explored its…
Descriptors: Performance, Automation, Scoring, Models
Peer reviewed Peer reviewed
PDF on ERIC Download full text
Naima Debbar – International Journal of Contemporary Educational Research, 2024
Intelligent systems of essay grading constitute important tools for educational technologies. They can significantly replace the manual scoring efforts and provide instructional feedback as well. These systems typically include two main parts: a feature extractor and an automatic grading model. The latter is generally based on computational and…
Descriptors: Test Scoring Machines, Computer Uses in Education, Artificial Intelligence, Essay Tests
Peer reviewed Peer reviewed
Direct linkDirect link
John R. Donoghue; Carol Eckerly – Applied Measurement in Education, 2024
Trend scoring constructed response items (i.e. rescoring Time A responses at Time B) gives rise to two-way data that follow a product multinomial distribution rather than the multinomial distribution that is usually assumed. Recent work has shown that the difference in sampling model can have profound negative effects on statistics usually used to…
Descriptors: Scoring, Error of Measurement, Reliability, Scoring Rubrics
Saenz, David Arron – Online Submission, 2023
There is a vast body of literature documenting the positive impacts that rater training and calibration sessions have on inter-rater reliability as research indicates several factors including frequency and timing play crucial roles towards ensuring inter-rater reliability. Additionally, increasing amounts research indicate possible links in…
Descriptors: Interrater Reliability, Scoring, Training, Scoring Rubrics
Peer reviewed Peer reviewed
PDF on ERIC Download full text
Venessa F. Manna; Shuhong Li; Spiros Papageorgiou; Lixiong Gu – ETS Research Report Series, 2025
This technical manual describes the purpose and intended uses of the TOEFL iBT test, its target test-taker population, and relevant language use domains. The test design and scoring procedures are presented first, followed by a research agenda intended to support the interpretation and use of test scores. Given the updates to the test starting…
Descriptors: Second Language Learning, English (Second Language), Language Tests, Test Construction
Peer reviewed Peer reviewed
Direct linkDirect link
Keshav Panray Jungbadoor; Xi Hong; Liu Liu; Yunan Zhu; Xinni Huang; Viraiyan Teeroovengadum; Gwilym Croucher; Angel Calderon; Sara Bice; Hamish Coates – Tertiary Education and Management, 2025
This paper reports on a multiyear program of international collaborative research delivered with the aim of conceptualising, validating and prototyping rubrics for evaluating and reporting university activities and outcomes relevant to the UN SDGs. The paper sets foundations by building on earlier analysis of research on university engagement with…
Descriptors: Higher Education, Universities, Sustainable Development, Scoring Rubrics
Peer reviewed Peer reviewed
Direct linkDirect link
David DiSabito; Lisa Hansen; Thomas Mennella; Josephine Rodriguez – New Directions for Teaching and Learning, 2025
This chapter investigates the integration of generative AI (GenAI), specifically ChatGPT, into institutional and course-level assessment at Western New England University. It explores the potential of GenAI to streamline the assessment process, making it more efficient, equitable, and objective. Through the development of a proprietary GenAI tool,…
Descriptors: Artificial Intelligence, Technology Uses in Education, Man Machine Systems, Educational Assessment
Peer reviewed Peer reviewed
Direct linkDirect link
Hui Jin; Cynthia Lima; Limin Wang – Educational Measurement: Issues and Practice, 2025
Although AI transformer models have demonstrated notable capability in automated scoring, it is difficult to examine how and why these models fall short in scoring some responses. This study investigated how transformer models' language processing and quantification processes can be leveraged to enhance the accuracy of automated scoring. Automated…
Descriptors: Automation, Scoring, Artificial Intelligence, Accuracy
Peer reviewed Peer reviewed
Direct linkDirect link
Casabianca, Jodi M.; Donoghue, John R.; Shin, Hyo Jeong; Chao, Szu-Fu; Choi, Ikkyu – Journal of Educational Measurement, 2023
Using item-response theory to model rater effects provides an alternative solution for rater monitoring and diagnosis, compared to using standard performance metrics. In order to fit such models, the ratings data must be sufficiently connected in order to estimate rater effects. Due to popular rating designs used in large-scale testing scenarios,…
Descriptors: Item Response Theory, Alternative Assessment, Evaluators, Research Problems
Peer reviewed Peer reviewed
Direct linkDirect link
Ormerod, Christopher; Lottridge, Susan; Harris, Amy E.; Patel, Milan; van Wamelen, Paul; Kodeswaran, Balaji; Woolf, Sharon; Young, Mackenzie – International Journal of Artificial Intelligence in Education, 2023
We introduce a short answer scoring engine made up of an ensemble of deep neural networks and a Latent Semantic Analysis-based model to score short constructed responses for a large suite of questions from a national assessment program. We evaluate the performance of the engine and show that the engine achieves above-human-level performance on a…
Descriptors: Computer Assisted Testing, Scoring, Artificial Intelligence, Semantics
Previous Page | Next Page »
Pages: 1  |  2  |  3  |  4  |  5  |  6  |  7  |  8  |  9  |  10  |  11  |  ...  |  673