NotesFAQContact Us
Collection
Advanced
Search Tips
Showing all 13 results Save | Export
Peer reviewed Peer reviewed
Direct linkDirect link
Papenberg, Martin; Musch, Jochen – Applied Measurement in Education, 2017
In multiple-choice tests, the quality of distractors may be more important than their number. We therefore examined the joint influence of distractor quality and quantity on test functioning by providing a sample of 5,793 participants with five parallel test sets consisting of items that differed in the number and quality of distractors.…
Descriptors: Multiple Choice Tests, Test Items, Test Validity, Test Reliability
Peer reviewed Peer reviewed
Direct linkDirect link
Phillips, Gary W. – Applied Measurement in Education, 2015
This article proposes that sampling design effects have potentially huge unrecognized impacts on the results reported by large-scale district and state assessments in the United States. When design effects are unrecognized and unaccounted for they lead to underestimating the sampling error in item and test statistics. Underestimating the sampling…
Descriptors: State Programs, Sampling, Research Design, Error of Measurement
Peer reviewed Peer reviewed
Direct linkDirect link
Wan, Lei; Henly, George A. – Applied Measurement in Education, 2012
Many innovative item formats have been proposed over the past decade, but little empirical research has been conducted on their measurement properties. This study examines the reliability, efficiency, and construct validity of two innovative item formats--the figural response (FR) and constructed response (CR) formats used in a K-12 computerized…
Descriptors: Test Items, Test Format, Computer Assisted Testing, Measurement
Peer reviewed Peer reviewed
Direct linkDirect link
Wise, Lauress L. – Applied Measurement in Education, 2010
The articles in this special issue make two important contributions to our understanding of the impact of accommodations on test score validity. First, they illustrate a variety of methods for collection and rigorous analyses of empirical data that can supplant expert judgment of the impact of accommodations. These methods range from internal…
Descriptors: Reading Achievement, Educational Assessment, Test Reliability, Learning Disabilities
Peer reviewed Peer reviewed
Direct linkDirect link
Taylor, Catherine S.; Lee, Yoonsun – Applied Measurement in Education, 2010
Item response theory (IRT) methods are generally used to create score scales for large-scale tests. Research has shown that IRT scales are stable across groups and over time. Most studies have focused on items that are dichotomously scored. Now Rasch and other IRT models are used to create scales for tests that include polytomously scored items.…
Descriptors: Measures (Individuals), Item Response Theory, Robustness (Statistics), Item Analysis
Peer reviewed Peer reviewed
Direct linkDirect link
Johnson, Robert L.; Penny, Jim; Fisher, Steve; Kuhs, Therese – Applied Measurement in Education, 2003
When raters assign different scores to a performance task, a method for resolving rating differences is required to report a single score to the examinee. Recent studies indicate that decisions about examinees, such as pass/fail decisions, differ across resolution methods. Previous studies also investigated the interrater reliability of…
Descriptors: Test Reliability, Test Validity, Scores, Interrater Reliability
Peer reviewed Peer reviewed
Direct linkDirect link
Wise, Steven L. – Applied Measurement in Education, 2006
In low-stakes testing, the motivation levels of examinees are often a matter of concern to test givers because a lack of examinee effort represents a direct threat to the validity of the test data. This study investigated the use of response time to assess the amount of examinee effort received by individual test items. In 2 studies, it was found…
Descriptors: Computer Assisted Testing, Motivation, Test Validity, Item Response Theory
Peer reviewed Peer reviewed
Hambleton, Ronald K.; Slater, Sharon C. – Applied Measurement in Education, 1997
A brief history of developments in the assessment of the reliability of credentialing examinations is presented, and some new results are outlined that highlight the interactions among scoring, standard setting, and the reliability and validity of pass-fail decisions. Decision consistency is an important concept in evaluating credentialing…
Descriptors: Certification, Credentials, Decision Making, Interaction
Peer reviewed Peer reviewed
Dunbar, Stephen B.; And Others – Applied Measurement in Education, 1991
Issues pertaining to the quality of performance assessments, including reliability and validity, are discussed. The relatively limited generalizability of performance across tasks is indicative of the care needed to evaluate performance assessments. Quality control is an empirical matter when measurement is intended to inform public policy. (SLD)
Descriptors: Educational Assessment, Generalization, Interrater Reliability, Measurement Techniques
Peer reviewed Peer reviewed
Millman, Jason – Applied Measurement in Education, 1991
Alternatives to multiple-choice tests for teacher licensing examinations are described, and their advantages are cited. Concerns are expressed in the areas of cost and practicality, reliability, corruptibility, and validity. A suggestion for reducing costs using multiple-choice responses calibrated to constructed-response tasks is proposed. (SLD)
Descriptors: Beginning Teachers, Constructed Response, Cost Effectiveness, Educational Assessment
Peer reviewed Peer reviewed
Royer, James M.; Carlo, Maria S. – Applied Measurement in Education, 1991
Measures of linguistic competence for limited-English-proficient students are discussed. The results for 134 students in grades 3 through 6 from a study of the reliability and validity of the Sentence Verification Technique tests as measures of listening and reading comprehension performance in native languages and English are reported. (TJH)
Descriptors: Bilingual Education, Comparative Testing, Elementary Education, Elementary School Students
Peer reviewed Peer reviewed
Vispoel, Walter P.; Coffman, Don D. – Applied Measurement in Education, 1994
Computerized-adaptive (CAT) and self-adapted (SAT) music listening tests were compared for efficiency, reliability, validity, and motivational benefits with 53 junior high school students. Results demonstrate trade-offs, with greater potential motivational benefits for SAT and greater efficiency for CAT. SAT elicited more favorable responses from…
Descriptors: Adaptive Testing, Computer Assisted Testing, Efficiency, Item Response Theory
Peer reviewed Peer reviewed
Mehrens, William A.; Popham, W. James – Applied Measurement in Education, 1992
This paper discusses how to determine whether a test was developed in a legally defensible manner, reviewing general issues, specific cases bearing on different types of test use, some evaluative dimensions, and evidence of test quality. Tests constructed and used according to existing standards will generally stand legal scrutiny. (SLD)
Descriptors: College Entrance Examinations, Compliance (Legal), Constitutional Law, Court Litigation