NotesFAQContact Us
Collection
Advanced
Search Tips
Showing all 7 results Save | Export
Peer reviewed Peer reviewed
Direct linkDirect link
Karoline A. Sachse; Sebastian Weirich; Nicole Mahler; Camilla Rjosk – International Journal of Testing, 2024
In order to ensure content validity by covering a broad range of content domains, the testing times of some educational large-scale assessments last up to a total of two hours or more. Performance decline over the course of taking the test has been extensively documented in the literature. It can occur due to increases in the numbers of: (a)…
Descriptors: Test Wiseness, Test Score Decline, Testing Problems, Foreign Countries
Peer reviewed Peer reviewed
Direct linkDirect link
Man, Kaiwen; Harring, Jeffery R.; Ouyang, Yunbo; Thomas, Sarah L. – International Journal of Testing, 2018
Many important high-stakes decisions--college admission, academic performance evaluation, and even job promotion--depend on accurate and reliable scores from valid large-scale assessments. However, examinees sometimes cheat by copying answers from other test-takers or practicing with test items ahead of time, which can undermine the effectiveness…
Descriptors: Reaction Time, High Stakes Tests, Test Wiseness, Cheating
Peer reviewed Peer reviewed
Direct linkDirect link
Kim, Yoon Jeon; Almond, Russell G.; Shute, Valerie J. – International Journal of Testing, 2016
Game-based assessment (GBA) is a specific use of educational games that employs game activities to elicit evidence for educationally valuable skills and knowledge. While this approach can provide individualized and diagnostic information about students, the design and development of assessment mechanics for a GBA is a nontrivial task. In this…
Descriptors: Design, Evidence Based Practice, Test Construction, Physics
Peer reviewed Peer reviewed
Direct linkDirect link
Briggs, Derek C.; Circi, Ruhan – International Journal of Testing, 2017
Artificial Neural Networks (ANNs) have been proposed as a promising approach for the classification of students into different levels of a psychological attribute hierarchy. Unfortunately, because such classifications typically rely upon internally produced item response patterns that have not been externally validated, the instability of ANN…
Descriptors: Artificial Intelligence, Classification, Student Evaluation, Tests
Peer reviewed Peer reviewed
Direct linkDirect link
Maeda, Hotaka; Zhang, Bo – International Journal of Testing, 2017
The omega (?) statistic is reputed to be one of the best indices for detecting answer copying on multiple choice tests, but its performance relies on the accurate estimation of copier ability, which is challenging because responses from the copiers may have been contaminated. We propose an algorithm that aims to identify and delete the suspected…
Descriptors: Cheating, Test Items, Mathematics, Statistics
Peer reviewed Peer reviewed
Direct linkDirect link
Finch, W. Holmes; Hernández Finch, Maria E.; French, Brian F. – International Journal of Testing, 2016
Differential item functioning (DIF) assessment is key in score validation. When DIF is present scores may not accurately reflect the construct of interest for some groups of examinees, leading to incorrect conclusions from the scores. Given rising immigration, and the increased reliance of educational policymakers on cross-national assessments…
Descriptors: Test Bias, Scores, Native Language, Language Usage
Peer reviewed Peer reviewed
Direct linkDirect link
Banks, Kathleen; Jeddeeni, Ahmad; Walker, Cindy M. – International Journal of Testing, 2016
Differential bundle functioning (DBF) analyses were conducted to determine whether seventh and eighth grade second language learners (SLLs) had lower probabilities of answering bundles of math word problems correctly that had heavy language demands, when compared to non-SLLs of equal math proficiency. Math word problems on each of four test forms…
Descriptors: Middle School Students, English Language Learners, Second Language Learning, Grade 7