Publication Date
In 2025 | 0 |
Since 2024 | 0 |
Since 2021 (last 5 years) | 0 |
Since 2016 (last 10 years) | 0 |
Since 2006 (last 20 years) | 4 |
Descriptor
Test Construction | 21 |
Reliability | 11 |
Test Reliability | 8 |
Scores | 7 |
Multiple Choice Tests | 5 |
Test Items | 5 |
Test Validity | 5 |
Validity | 5 |
Item Response Theory | 4 |
Test Use | 4 |
Constructed Response | 3 |
More ▼ |
Source
Applied Measurement in… | 21 |
Author
Publication Type
Journal Articles | 21 |
Reports - Evaluative | 13 |
Reports - Research | 6 |
Reports - Descriptive | 2 |
Speeches/Meeting Papers | 2 |
Education Level
Elementary Education | 1 |
Elementary Secondary Education | 1 |
Grade 10 | 1 |
Grade 5 | 1 |
Grade 8 | 1 |
High Schools | 1 |
Intermediate Grades | 1 |
Junior High Schools | 1 |
Middle Schools | 1 |
Secondary Education | 1 |
Audience
Location
Laws, Policies, & Programs
Assessments and Surveys
Texas Assessment of Academic… | 1 |
What Works Clearinghouse Rating
Taylor, Melinda Ann; Pastor, Dena A. – Applied Measurement in Education, 2013
Although federal regulations require testing students with severe cognitive disabilities, there is little guidance regarding how technical quality should be established. It is known that challenges exist with documentation of the reliability of scores for alternate assessments. Typical measures of reliability do little in modeling multiple sources…
Descriptors: Generalizability Theory, Alternative Assessment, Test Reliability, Scores
Stone, Clement A.; Ye, Feifei; Zhu, Xiaowen; Lane, Suzanne – Applied Measurement in Education, 2010
Although reliability of subscale scores may be suspect, subscale scores are the most common type of diagnostic information included in student score reports. This research compared methods for augmenting the reliability of subscale scores for an 8th-grade mathematics assessment. Yen's Objective Performance Index, Wainer et al.'s augmented scores,…
Descriptors: Item Response Theory, Case Studies, Reliability, Scores

Myford, Carol M. – Applied Measurement in Education, 2002
Studied the use of descriptive graphic rating scales by 11 raters to evaluate students' work, exploring different design features. Used a Rasch-model based rating scale analysis to determine that all the continuous scales could be considered to have at least five points, and that defined midpoints did not result in higher student separation…
Descriptors: Evaluators, Rating Scales, Reliability, Test Construction
Hogan, Thomas P.; Murphy, Gavin – Applied Measurement in Education, 2007
We determined the recommendations for preparing and scoring constructed-response (CR) test items in 25 sources (textbooks and chapters) on educational and psychological measurement. The project was similar to Haladyna's (2004) analysis for multiple-choice items. We identified 12 recommendations for preparing CR items given by multiple sources,…
Descriptors: Test Items, Scoring, Test Construction, Educational Indicators

Dunbar, Stephen B.; And Others – Applied Measurement in Education, 1991
Issues pertaining to the quality of performance assessments, including reliability and validity, are discussed. The relatively limited generalizability of performance across tasks is indicative of the care needed to evaluate performance assessments. Quality control is an empirical matter when measurement is intended to inform public policy. (SLD)
Descriptors: Educational Assessment, Generalization, Interrater Reliability, Measurement Techniques

Feldt, Leonard S. – Applied Measurement in Education, 1997
It has often been asserted that the reliability of a measure places an upper limit on its validity. This article demonstrates in theory that validity can rise when reliability declines, even when validity evidence is a correlation with an acceptable criterion. Whether empirical examples can actually be found is an open question. (SLD)
Descriptors: Correlation, Criteria, Reliability, Test Construction

Enders, Craig K.; Bandalos, Deborah L. – Applied Measurement in Education, 1999
Examined the degree to which coefficient alpha is affected by including items with different distribution shapes within a unidimensional scale. Computer simulation results indicate that reliability does not increase dramatically as a result of using differentially shaped items within a scale. Discusses implications for test construction. (SLD)
Descriptors: Computer Simulation, Reliability, Scaling, Statistical Distributions

Frary, Robert B. – Applied Measurement in Education, 2000
Characterizes the circumstances under which validity changes may occur as a result of the deletion of a predictor test segment. Equations show that, for a positive outcome, one should seek a relatively large correlation between the scores from the deleted segment and the remaining items, with a relatively low correlation between scores from the…
Descriptors: Equations (Mathematics), Prediction, Reliability, Scores

Feldt, Leonard S. – Applied Measurement in Education, 2002
Considers the situation in which content or administrative considerations limit the way in which a test can be partitioned to estimate the internal consistency reliability of the total test score. Demonstrates that a single-valued estimate of the total score reliability is possible only if an assumption is made about the comparative size of the…
Descriptors: Error of Measurement, Reliability, Scores, Test Construction

Mehrens, William A. – Applied Measurement in Education, 2000
Presents conclusions of an independent measurement expert that the Texas Assessment of Academic Skills (TAAS) was constructed according to acceptable professional standards and tests curricular material considered by the Texas Board of Education important for graduates to have mastered. Also supports the validity and reliability of the TAAS and…
Descriptors: Curriculum, Psychometrics, Reliability, Standards

Webb, Noreen M.; Schlackman, Jonah; Sugrue, Brenda – Applied Measurement in Education, 2000
Studied the importance of occasion as a source of error variance in estimates of the dependability (generalizability) of science assessment scores and the Interchangeability of science test formats. Junior high school students (n=662) took hands-on and paper-and-pencil tests twice. Results show that recognizing occasion as a facet of error alters…
Descriptors: Generalization, Junior High School Students, Junior High Schools, Reliability

Fitzpatrick, Anne R.; Yen, Wendy M. – Applied Measurement in Education, 2001
Examined the effects of test length and sample size on the alternate forms reliability and equating of simulated mathematics tests composed of constructed response items scaled using the two-parameter partial credit model. Results suggest that, to obtain acceptable reliabilities and accurate equated scores, tests should have at least 8 6-point…
Descriptors: Constructed Response, Equated Scores, Mathematics Tests, Reliability

Haladyna, Thomas M.; Downing, Steven M. – Applied Measurement in Education, 1989
A taxonomy of 43 rules for writing multiple-choice test items is presented, based on a consensus of 46 textbooks. These guidelines are presented as complete and authoritative, with solid consensus apparent for 33 of the rules. Four rules lack consensus, and 5 rules were cited fewer than 10 times. (SLD)
Descriptors: Classification, Interrater Reliability, Multiple Choice Tests, Objective Tests

Shapley, Kelly S.; Bush, M. Joan – Applied Measurement in Education, 1999
Examined the validity and reliability of the 1995-96 reading/language arts portfolio assessment developed in the Dallas (Texas) public schools for prekindergarten through second grade. Ratings by 42 teachers show that portfolio contents do not provide a valid sample of student work and the assessment reliability is low. (SLD)
Descriptors: Language Arts, Portfolio Assessment, Portfolios (Background Materials), Primary Education

Feldt, Leonard S. – Applied Measurement in Education, 1993
The recommendation that the reliability of multiple-choice tests will be enhanced if the distribution of item difficulties is concentrated at approximately 0.50 is reinforced and extended in this article by viewing the 0/1 item scoring as a dichotomization of an underlying normally distributed ability score. (SLD)
Descriptors: Ability, Difficulty Level, Guessing (Tests), Mathematical Models
Previous Page | Next Page ยป
Pages: 1 | 2