NotesFAQContact Us
Collection
Advanced
Search Tips
Audience
Location
Laws, Policies, & Programs
Assessments and Surveys
ACT Assessment1
What Works Clearinghouse Rating
Showing 1 to 15 of 17 results Save | Export
Peer reviewed Peer reviewed
Direct linkDirect link
Samah AlKhuzaey; Floriana Grasso; Terry R. Payne; Valentina Tamma – International Journal of Artificial Intelligence in Education, 2024
Designing and constructing pedagogical tests that contain items (i.e. questions) which measure various types of skills for different levels of students equitably is a challenging task. Teachers and item writers alike need to ensure that the quality of assessment materials is consistent, if student evaluations are to be objective and effective.…
Descriptors: Test Items, Test Construction, Difficulty Level, Prediction
Peer reviewed Peer reviewed
Direct linkDirect link
Villarroel, Verónica; Bloxham, Susan; Bruna, Daniela; Bruna, Carola; Herrera-Seda, Constanza – Assessment & Evaluation in Higher Education, 2018
Authenticity has been identified as a key characteristic of assessment design which promotes learning. Authentic assessment aims to replicate the tasks and performance standards typically found in the world of work, and has been found to have a positive impact on student learning, autonomy, motivation, self-regulation and metacognition; abilities…
Descriptors: Performance Based Assessment, Barriers, Higher Education, Models
Peer reviewed Peer reviewed
Direct linkDirect link
Yang, Xuexue – International Multilingual Research Journal, 2020
Despite the importance of assessment accommodations, little is known about its use in the context of classroom assessments. To provide guidance for teachers on how to best support their emergent bilinguals during classroom assessments, there may be ideas from large-scale assessments that can be used in the classrooms. This article, a targeted…
Descriptors: Testing Accommodations, Measurement, Bilingualism, Second Language Learning
Peer reviewed Peer reviewed
Direct linkDirect link
Nichols, Bryan E. – Update: Applications of Research in Music Education, 2016
The purpose of this review of literature was to identify research findings for designing assessments in singing accuracy. The aim was to specify the test construction variables that directly affect test performance to guide future design in singing accuracy assessment for research and classroom uses. Three pitch-matching tasks--single pitch,…
Descriptors: Singing, Accuracy, Music, Music Education
Peer reviewed Peer reviewed
Direct linkDirect link
Gierl, Mark J.; Bulut, Okan; Guo, Qi; Zhang, Xinxin – Review of Educational Research, 2017
Multiple-choice testing is considered one of the most effective and enduring forms of educational assessment that remains in practice today. This study presents a comprehensive review of the literature on multiple-choice testing in education focused, specifically, on the development, analysis, and use of the incorrect options, which are also…
Descriptors: Multiple Choice Tests, Difficulty Level, Accuracy, Error Patterns
Peer reviewed Peer reviewed
Haladyna, Tom; Roid, Gale – Journal of Educational Measurement, 1981
The rationale for use of instructional sensitivity in the empirical review of test items is examined, and the results of a study that distinguishes instructional sensitivity from other item concepts are presented. Research is reviewed which indicates the existence of instructional sensitivity as a unique criterion-referenced test item concept. (RL)
Descriptors: Criterion Referenced Tests, Difficulty Level, Evaluation Criteria, Pretests Posttests
Peer reviewed Peer reviewed
Albanese, Mark A. – Educational Measurement: Issues and Practice, 1993
A comprehensive review is given of evidence, with a bearing on the recommendation to avoid use of complex multiple choice (CMC) items. Avoiding Type K items (four primary responses and five secondary choices) seems warranted, but evidence against CMC in general is less clear. (SLD)
Descriptors: Cues, Difficulty Level, Multiple Choice Tests, Responses
Woldbeck, Tanya – 1998
This paper summarizes some of the basic concepts in test equating. Various types of equating methods, as well as data collection designs, are outlined, with attempts to provide insight into preferred methods and techniques. Test equating describes a group of methods that enable test constructors and users to compare scores from two different forms…
Descriptors: Comparative Analysis, Data Collection, Difficulty Level, Equated Scores
Peer reviewed Peer reviewed
Knowles, Susan L.; Welch, Cynthia A. – Educational and Psychological Measurement, 1992
A meta-analysis of the difficulty and discrimination of the "none-of-the-above" (NOTA) test option was conducted with 12 articles (20 effect sizes) for difficulty and 7 studies (11 effect sizes) for discrimination. Findings indicate that using the NOTA option does not result in items of lesser quality. (SLD)
Descriptors: Difficulty Level, Effect Size, Meta Analysis, Multiple Choice Tests
Peer reviewed Peer reviewed
Anderson, J. Charles – Reading in a Foreign Language, 1990
Focuses on the performance levels of the Test of English for Educational Purposes (TEEP) and English Language Testing Service (ELTS) tests. It is concluded that more attention should be paid to the process underlying test performance. (15 references) (GLR)
Descriptors: Classification, Difficulty Level, English (Second Language), Language Tests
Peer reviewed Peer reviewed
Frisbie, David A. – Educational Measurement: Issues and Practice, 1992
Literature related to the multiple true-false (MTF) item format is reviewed. Each answer cluster of a MTF item may have several true items and the correctness of each is judged independently. MTF tests appear efficient and reliable, although they are a bit harder than multiple choice items for examinees. (SLD)
Descriptors: Achievement Tests, Difficulty Level, Literature Reviews, Multiple Choice Tests
Wainer, Howard; Thissen, David – 1994
When an examination consists in whole or part of constructed response test items, it is common practice to allow the examinee to choose a subset of the constructed response questions from a larger pool. It is sometimes argued that, if choice were not allowed, the limitations on domain coverage forced by the small number of items might unfairly…
Descriptors: Constructed Response, Difficulty Level, Educational Testing, Equated Scores
Rentz, R. Robert; Rentz, Charlotte C. – 1978
Issues of concern to test developers interested in applying the Rasch model are discussed. The current state of the art, recommendations for use of the model, further needs, and controversies are described for the three stages of test construction: (1) definition of the content of the test and item writing; (2) item analysis; and (3) test…
Descriptors: Ability, Achievement Tests, Difficulty Level, Goodness of Fit
Colton, Dean A. – 1993
Tables of specifications are used to guide test developers in sampling items and maintaining consistency from form to form. This paper is a generalizability study of the American College Testing Program (ACT) Achievement Program Mathematics Test (AAP), with the content areas of the table of specifications representing multiple dependent variables.…
Descriptors: Achievement Tests, Difficulty Level, Error of Measurement, Generalizability Theory
Haladyna, Tom; Roid, Gale – 1980
An empirical review of test items is described as an essential step in criterion-referenced test development. The concept of test items' instructional sensitivity is introduced, and research is briefly reviewed which describes four theoretical contexts in which instructional sensitivity indexes have been observed: criterion-referenced; classical…
Descriptors: Achievement Tests, Bayesian Statistics, Course Objectives, Criterion Referenced Tests
Previous Page | Next Page »
Pages: 1  |  2