NotesFAQContact Us
Collection
Advanced
Search Tips
Publication Date
In 20250
Since 20240
Since 2021 (last 5 years)0
Since 2016 (last 10 years)0
Since 2006 (last 20 years)5
What Works Clearinghouse Rating
Showing all 12 results Save | Export
Peer reviewed Peer reviewed
Direct linkDirect link
Kim, Sooyeon; Walker, Michael E.; McHale, Frederick – Journal of Educational Measurement, 2010
In this study we examined variations of the nonequivalent groups equating design for tests containing both multiple-choice (MC) and constructed-response (CR) items to determine which design was most effective in producing equivalent scores across the two tests to be equated. Using data from a large-scale exam, this study investigated the use of…
Descriptors: Measures (Individuals), Scoring, Equated Scores, Test Bias
Xu, Zeyu; Nichols, Austin – National Center for Analysis of Longitudinal Data in Education Research, 2010
The gold standard in making causal inference on program effects is a randomized trial. Most randomization designs in education randomize classrooms or schools rather than individual students. Such "clustered randomization" designs have one principal drawback: They tend to have limited statistical power or precision. This study aims to…
Descriptors: Test Format, Reading Tests, Norm Referenced Tests, Research Design
Peer reviewed Peer reviewed
Direct linkDirect link
Craig, Pippa; Gordon, Jill; Clarke, Rufus; Oldmeadow, Wendy – Assessment & Evaluation in Higher Education, 2009
This study aimed to provide evidence to guide decisions on the type and timing of assessments in a graduate medical programme, by identifying whether students from particular degree backgrounds face greater difficulty in satisfying the current assessment requirements. We examined the performance rank of students in three types of assessments and…
Descriptors: Student Evaluation, Medical Education, Student Characteristics, Correlation
Zheng, Ying; Cheng, Liying; Klinger, Don A. – TESL Canada Journal, 2007
Large scale testing in English affects second-language students not only greatly but also differently than first-language learners. The research literature reports that confounding factors in such large-scale testing such as varying test formats may differentially affect the performance of students from diverse backgrounds. An investigation of…
Descriptors: Reading Comprehension, Reading Tests, Test Format, Educational Testing
Peer reviewed Peer reviewed
Saunders, Phillip – Journal of Economic Education, 1991
Discusses the content and cognitive specification of the third edition of the Test of Understanding in College Economics. Presents examples of the construction and sampling criteria employed in the latest and previous versions of the test. Explains that the test emphasizes recognition and understanding of basic terms, concepts, and principles with…
Descriptors: Economics Education, Educational Testing, Higher Education, Student Evaluation
Peer reviewed Peer reviewed
Deville, Craig; O'Neill, Thomas; Wright, Benjamin D.; Woodcock, Richard W.; Munoz-Sandoval, Ana; Gershon, Richard C.; Bergstrom, Betty – Popular Measurement, 1998
Articles in this special section consider (1) flow in test taking (Craig Deville); (2) testwiseness (Thomas O'Neill); (3) test length (Benjamin Wright); (4) cross-language test equating (Richard W. Woodcock and Ana Munoz-Sandoval); (5) computer-assisted testing and testwiseness (Richard Gershon and Betty Bergstrom); and (6) Web-enhanced testing…
Descriptors: Computer Assisted Testing, Educational Testing, Equated Scores, Measurement Techniques
Peer reviewed Peer reviewed
Direct linkDirect link
Wiliam, Dylan – Review of Research in Education, 2010
The idea that validity should be considered a property of inferences, rather than of assessments, has developed slowly over the past century. In early writings about the validity of educational assessments, validity was defined as a property of an assessment. The most common definition was that an assessment was valid to the extent that it…
Descriptors: Educational Assessment, Validity, Inferences, Construct Validity
Wainer, Howard; Thissen, David – 1994
When an examination consists in whole or part of constructed response test items, it is common practice to allow the examinee to choose a subset of the constructed response questions from a larger pool. It is sometimes argued that, if choice were not allowed, the limitations on domain coverage forced by the small number of items might unfairly…
Descriptors: Constructed Response, Difficulty Level, Educational Testing, Equated Scores
Terwilliger, James S. – 1991
This paper clarifies important distinctions in item writing and item scoring and considers the implications of these distinctions for developing guidelines related to test construction for training teachers. The terminology used to describe and classify paper and pencil test questions frequently confuses two distinct features of questions:…
Descriptors: Classroom Techniques, Educational Testing, Higher Education, Measurement Techniques
Gulliksen, Harold – 1985
This article presents the perspective that the quality of teacher-made, small classroom tests has not improved, and may have declined in recent years. This decline may be due to the fact that teachers have come to believe that the kinds of objective items used in national standardized tests are the only item types appropriate for classroom use.…
Descriptors: Adults, Classroom Techniques, Educational Testing, Educational Trends
Peer reviewed Peer reviewed
Direct linkDirect link
Curren, Randall R. – Theory and Research in Education, 2004
This article addresses the capacity of high stakes tests to measure the most significant kinds of learning. It begins by examining a set of philosophical arguments pertaining to construct validity and alleged conceptual obstacles to attributing specific knowledge and skills to learners. The arguments invoke philosophical doctrines of holism and…
Descriptors: Test Items, Educational Testing, Construct Validity, High Stakes Tests
Peer reviewed Peer reviewed
PDF on ERIC Download full text
Ketterlin-Geller, Leanne R. – Journal of Technology, Learning, and Assessment, 2005
Universal design for assessment (UDA) is intended to increase participation of students with disabilities and English-language learners in general education assessments by addressing student needs through customized testing platforms. Computer-based testing provides an optimal format for creating individually-tailored tests. However, although a…
Descriptors: Student Needs, Disabilities, Grade 3, Second Language Learning