ERIC - Search Results

Publication Date

In 2025	0
Since 2024	0
Since 2021 (last 5 years)	0
Since 2016 (last 10 years)	1
Since 2006 (last 20 years)	6

Source

Applied Measurement in…

Publication Type

Journal Articles	13
Reports - Evaluative	6
Reports - Research	6
Speeches/Meeting Papers	2
Reports - Descriptive	1

Education Level

Grade 4	2
Grade 8	2
Elementary Education	1
Elementary Secondary Education	1
Grade 10	1
Grade 5	1
Grade 7	1
High Schools	1
Middle Schools	1
Secondary Education	1

Audience

Location

Germany

Laws, Policies, & Programs

Assessments and Surveys

What Works Clearinghouse Rating

Showing all 13 results Save | Export

Of Small Beauties and Large Beasts: The Quality of Distractors on Multiple-Choice Tests Is More Important than Their Quantity

Peer reviewed

Direct link

Papenberg, Martin; Musch, Jochen – Applied Measurement in Education, 2017

In multiple-choice tests, the quality of distractors may be more important than their number. We therefore examined the joint influence of distractor quality and quantity on test functioning by providing a sample of 5,793 participants with five parallel test sets consisting of items that differed in the number and quality of distractors.…

Descriptors: Multiple Choice Tests, Test Items, Test Validity, Test Reliability

Impact of Design Effects in Large-Scale District and State Assessments

Peer reviewed

Direct link

Phillips, Gary W. – Applied Measurement in Education, 2015

This article proposes that sampling design effects have potentially huge unrecognized impacts on the results reported by large-scale district and state assessments in the United States. When design effects are unrecognized and unaccounted for they lead to underestimating the sampling error in item and test statistics. Underestimating the sampling…

Descriptors: State Programs, Sampling, Research Design, Error of Measurement

Measurement Properties of Two Innovative Item Formats in a Computer-Based Test

Peer reviewed

Direct link

Wan, Lei; Henly, George A. – Applied Measurement in Education, 2012

Many innovative item formats have been proposed over the past decade, but little empirical research has been conducted on their measurement properties. This study examines the reliability, efficiency, and construct validity of two innovative item formats--the figural response (FR) and constructed response (CR) formats used in a K-12 computerized…

Descriptors: Test Items, Test Format, Computer Assisted Testing, Measurement

Accessible Reading Assessments for Students with Disabilities: Summary and Conclusions

Peer reviewed

Direct link

Wise, Lauress L. – Applied Measurement in Education, 2010

The articles in this special issue make two important contributions to our understanding of the impact of accommodations on test score validity. First, they illustrate a variety of methods for collection and rigorous analyses of empirical data that can supplant expert judgment of the impact of accommodations. These methods range from internal…

Descriptors: Reading Achievement, Educational Assessment, Test Reliability, Learning Disabilities

Stability of Rasch Scales over Time

Peer reviewed

Direct link

Taylor, Catherine S.; Lee, Yoonsun – Applied Measurement in Education, 2010

Item response theory (IRT) methods are generally used to create score scales for large-scale tests. Research has shown that IRT scales are stable across groups and over time. Most studies have focused on items that are dichotomously scored. Now Rasch and other IRT models are used to create scales for tests that include polytomously scored items.…

Descriptors: Measures (Individuals), Item Response Theory, Robustness (Statistics), Item Analysis

Score Resolution: An Investigation of the Reliability and Validity of Resolved Scores

Peer reviewed

Direct link

Johnson, Robert L.; Penny, Jim; Fisher, Steve; Kuhs, Therese – Applied Measurement in Education, 2003

When raters assign different scores to a performance task, a method for resolving rating differences is required to report a single score to the examinee. Recent studies indicate that decisions about examinees, such as pass/fail decisions, differ across resolution methods. Previous studies also investigated the interrater reliability of…

Descriptors: Test Reliability, Test Validity, Scores, Interrater Reliability

An Investigation of the Differential Effort Received by Items on a Low-Stakes Computer-Based Test

Peer reviewed

Direct link

Wise, Steven L. – Applied Measurement in Education, 2006

In low-stakes testing, the motivation levels of examinees are often a matter of concern to test givers because a lack of examinee effort represents a direct threat to the validity of the test data. This study investigated the use of response time to assess the amount of examinee effort received by individual test items. In 2 studies, it was found…

Descriptors: Computer Assisted Testing, Motivation, Test Validity, Item Response Theory

Reliability of Credentialing Examinations and the Impact of Scoring Models and Standard-Setting Policies.

Peer reviewed

Hambleton, Ronald K.; Slater, Sharon C. – Applied Measurement in Education, 1997

A brief history of developments in the assessment of the reliability of credentialing examinations is presented, and some new results are outlined that highlight the interactions among scoring, standard setting, and the reliability and validity of pass-fail decisions. Decision consistency is an important concept in evaluating credentialing…

Descriptors: Certification, Credentials, Decision Making, Interaction

Quality Control in the Development and Use of Performance Assessments.

Peer reviewed

Dunbar, Stephen B.; And Others – Applied Measurement in Education, 1991

Issues pertaining to the quality of performance assessments, including reliability and validity, are discussed. The relatively limited generalizability of performance across tasks is indicative of the care needed to evaluate performance assessments. Quality control is an empirical matter when measurement is intended to inform public policy. (SLD)

Descriptors: Educational Assessment, Generalization, Interrater Reliability, Measurement Techniques

Teaching Licensing and the New Assessment Methodologies.

Peer reviewed

Millman, Jason – Applied Measurement in Education, 1991

Alternatives to multiple-choice tests for teacher licensing examinations are described, and their advantages are cited. Concerns are expressed in the areas of cost and practicality, reliability, corruptibility, and validity. A suggestion for reducing costs using multiple-choice responses calibrated to constructed-response tasks is proposed. (SLD)

Descriptors: Beginning Teachers, Constructed Response, Cost Effectiveness, Educational Assessment

Assessing the Language Acquisition Progress of Limited English Proficient Students: Problems and a New Alternative.

Peer reviewed

Royer, James M.; Carlo, Maria S. – Applied Measurement in Education, 1991

Measures of linguistic competence for limited-English-proficient students are discussed. The results for 134 students in grades 3 through 6 from a study of the reliability and validity of the Sentence Verification Technique tests as measures of listening and reading comprehension performance in native languages and English are reported. (TJH)

Descriptors: Bilingual Education, Comparative Testing, Elementary Education, Elementary School Students

Computerized-Adaptive and Self-Adapted Music-Listening Tests: Psychometric Features and Motivational Benefits.

Peer reviewed

Vispoel, Walter P.; Coffman, Don D. – Applied Measurement in Education, 1994

Computerized-adaptive (CAT) and self-adapted (SAT) music listening tests were compared for efficiency, reliability, validity, and motivational benefits with 53 junior high school students. Results demonstrate trade-offs, with greater potential motivational benefits for SAT and greater efficiency for CAT. SAT elicited more favorable responses from…

Descriptors: Adaptive Testing, Computer Assisted Testing, Efficiency, Item Response Theory

How to Evaluate the Legal Defensibility of High-Stakes Tests.

Peer reviewed

Mehrens, William A.; Popham, W. James – Applied Measurement in Education, 1992

This paper discusses how to determine whether a test was developed in a legally defensible manner, reviewing general issues, specific cases bearing on different types of test use, some evaluative dimensions, and evidence of test quality. Tests constructed and used according to existing standards will generally stand legal scrutiny. (SLD)

Descriptors: College Entrance Examinations, Compliance (Legal), Constitutional Law, Court Litigation

Test Reliability	13
Test Validity	12
Item Response Theory	5
Test Construction	5
Item Analysis	4
Scores	4
Computer Assisted Testing	3
Educational Assessment	3
Evaluation Methods	3
Multiple Choice Tests	3
Performance Based Assessment	3
Comparative Testing	2
Decision Making	2
Efficiency	2
Equated Scores	2
Interrater Reliability	2
Licensing Examinations…	2
Measurement Techniques	2
Scoring	2
Test Items	2
Test Use	2
Academic Accommodations…	1
Achievement Tests	1
Adaptive Testing	1
Beginning Teachers	1
More ▼

Carlo, Maria S.	1
Coffman, Don D.	1
Dunbar, Stephen B.	1
Fisher, Steve	1
Hambleton, Ronald K.	1
Henly, George A.	1
Johnson, Robert L.	1
Kuhs, Therese	1
Lee, Yoonsun	1
Mehrens, William A.	1
Millman, Jason	1
Musch, Jochen	1
Papenberg, Martin	1
Penny, Jim	1
Phillips, Gary W.	1
Popham, W. James	1
Royer, James M.	1
Slater, Sharon C.	1
Taylor, Catherine S.	1
Vispoel, Walter P.	1
Wan, Lei	1
Wise, Lauress L.	1
Wise, Steven L.	1
More ▼