ERIC - Search Results

Publication Date

In 2025	0
Since 2024	0
Since 2021 (last 5 years)	0
Since 2016 (last 10 years)	0
Since 2006 (last 20 years)	4

Descriptor

Test Construction	21
Reliability	11
Test Reliability	8
Scores	7
Multiple Choice Tests	5
Test Items	5
Test Validity	5
Validity	5
Item Response Theory	4
Test Use	4
Constructed Response	3
Educational Assessment	3
Interrater Reliability	3
Mathematics Tests	3
Statistical Distributions	3
Computer Assisted Testing	2
Correlation	2
Decision Making	2
Error of Measurement	2
Generalization	2
Guessing (Tests)	2
Item Analysis	2
Junior High School Students	2
Junior High Schools	2
Mathematical Models	2
More ▼

Source

Applied Measurement in…

Publication Type

Journal Articles	21
Reports - Evaluative	13
Reports - Research	6
Reports - Descriptive	2
Speeches/Meeting Papers	2

Education Level

Elementary Education	1
Elementary Secondary Education	1
Grade 10	1
Grade 5	1
Grade 8	1
High Schools	1
Intermediate Grades	1
Junior High Schools	1
Middle Schools	1
Secondary Education	1

Audience

Location

Laws, Policies, & Programs

Assessments and Surveys

Texas Assessment of Academic…

What Works Clearinghouse Rating

Showing 1 to 15 of 21 results Save | Export

An Application of Generalizability Theory to Evaluate the Technical Quality of an Alternate Assessment

Peer reviewed

Direct link

Taylor, Melinda Ann; Pastor, Dena A. – Applied Measurement in Education, 2013

Although federal regulations require testing students with severe cognitive disabilities, there is little guidance regarding how technical quality should be established. It is known that challenges exist with documentation of the reliability of scores for alternate assessments. Typical measures of reliability do little in modeling multiple sources…

Descriptors: Generalizability Theory, Alternative Assessment, Test Reliability, Scores

Providing Subscale Scores for Diagnostic Information: A Case Study when the Test Is Essentially Unidimensional

Peer reviewed

Direct link

Stone, Clement A.; Ye, Feifei; Zhu, Xiaowen; Lane, Suzanne – Applied Measurement in Education, 2010

Although reliability of subscale scores may be suspect, subscale scores are the most common type of diagnostic information included in student score reports. This research compared methods for augmenting the reliability of subscale scores for an 8th-grade mathematics assessment. Yen's Objective Performance Index, Wainer et al.'s augmented scores,…

Descriptors: Item Response Theory, Case Studies, Reliability, Scores

Investigating Design Features of Descriptive Graphic Rating Scales.

Peer reviewed

Myford, Carol M. – Applied Measurement in Education, 2002

Studied the use of descriptive graphic rating scales by 11 raters to evaluate students' work, exploring different design features. Used a Rasch-model based rating scale analysis to determine that all the continuous scales could be considered to have at least five points, and that defined midpoints did not result in higher student separation…

Descriptors: Evaluators, Rating Scales, Reliability, Test Construction

Recommendations for Preparing and Scoring Constructed-Response Items: What the Experts Say

Peer reviewed

Direct link

Hogan, Thomas P.; Murphy, Gavin – Applied Measurement in Education, 2007

We determined the recommendations for preparing and scoring constructed-response (CR) test items in 25 sources (textbooks and chapters) on educational and psychological measurement. The project was similar to Haladyna's (2004) analysis for multiple-choice items. We identified 12 recommendations for preparing CR items given by multiple sources,…

Descriptors: Test Items, Scoring, Test Construction, Educational Indicators

Quality Control in the Development and Use of Performance Assessments.

Peer reviewed

Dunbar, Stephen B.; And Others – Applied Measurement in Education, 1991

Issues pertaining to the quality of performance assessments, including reliability and validity, are discussed. The relatively limited generalizability of performance across tasks is indicative of the care needed to evaluate performance assessments. Quality control is an empirical matter when measurement is intended to inform public policy. (SLD)

Descriptors: Educational Assessment, Generalization, Interrater Reliability, Measurement Techniques

Can Validity Rise When Reliability Declines?

Peer reviewed

Feldt, Leonard S. – Applied Measurement in Education, 1997

It has often been asserted that the reliability of a measure places an upper limit on its validity. This article demonstrates in theory that validity can rise when reliability declines, even when validity evidence is a correlation with an acceptable criterion. Whether empirical examples can actually be found is an open question. (SLD)

Descriptors: Correlation, Criteria, Reliability, Test Construction

The Effects of Heterogeneous Item Distributions on Reliability.

Peer reviewed

Enders, Craig K.; Bandalos, Deborah L. – Applied Measurement in Education, 1999

Examined the degree to which coefficient alpha is affected by including items with different distribution shapes within a unidimensional scale. Computer simulation results indicate that reliability does not increase dramatically as a result of using differentially shaped items within a scale. Discusses implications for test construction. (SLD)

Descriptors: Computer Simulation, Reliability, Scaling, Statistical Distributions

Higher Validity in the Face of Lower Reliability: Another Look.

Peer reviewed

Frary, Robert B. – Applied Measurement in Education, 2000

Characterizes the circumstances under which validity changes may occur as a result of the deletion of a predictor test segment. Equations show that, for a positive outcome, one should seek a relatively large correlation between the scores from the deleted segment and the remaining items, with a relatively low correlation between scores from the…

Descriptors: Equations (Mathematics), Prediction, Reliability, Scores

Reliability Estimation When a Test Is Split into Two Parts of Unknown Effective Length.

Peer reviewed

Feldt, Leonard S. – Applied Measurement in Education, 2002

Considers the situation in which content or administrative considerations limit the way in which a test can be partitioned to estimate the internal consistency reliability of the total test score. Demonstrates that a single-valued estimate of the total score reliability is possible only if an assumption is made about the comparative size of the…

Descriptors: Error of Measurement, Reliability, Scores, Test Construction

Defending a State Graduation Test: "GI Forum v. Texas Education Agency." Measurement Perspectives from an External Evaluator.

Peer reviewed

Mehrens, William A. – Applied Measurement in Education, 2000

Presents conclusions of an independent measurement expert that the Texas Assessment of Academic Skills (TAAS) was constructed according to acceptable professional standards and tests curricular material considered by the Texas Board of Education important for graduates to have mastered. Also supports the validity and reliability of the TAAS and…

Descriptors: Curriculum, Psychometrics, Reliability, Standards

The Dependability and Interchangeability of Assessment Methods in Science.

Peer reviewed

Webb, Noreen M.; Schlackman, Jonah; Sugrue, Brenda – Applied Measurement in Education, 2000

Studied the importance of occasion as a source of error variance in estimates of the dependability (generalizability) of science assessment scores and the Interchangeability of science test formats. Junior high school students (n=662) took hands-on and paper-and-pencil tests twice. Results show that recognizing occasion as a facet of error alters…

Descriptors: Generalization, Junior High School Students, Junior High Schools, Reliability

The Effects of Test Length and Sample Size on the Reliability and Equating of Tests Composed of Constructed-Response Items.

Peer reviewed

Fitzpatrick, Anne R.; Yen, Wendy M. – Applied Measurement in Education, 2001

Examined the effects of test length and sample size on the alternate forms reliability and equating of simulated mathematics tests composed of constructed response items scaled using the two-parameter partial credit model. Results suggest that, to obtain acceptable reliabilities and accurate equated scores, tests should have at least 8 6-point…

Descriptors: Constructed Response, Equated Scores, Mathematics Tests, Reliability

A Taxonomy of Multiple-Choice Item-Writing Rules.

Peer reviewed

Haladyna, Thomas M.; Downing, Steven M. – Applied Measurement in Education, 1989

A taxonomy of 43 rules for writing multiple-choice test items is presented, based on a consensus of 46 textbooks. These guidelines are presented as complete and authoritative, with solid consensus apparent for 33 of the rules. Four rules lack consensus, and 5 rules were cited fewer than 10 times. (SLD)

Descriptors: Classification, Interrater Reliability, Multiple Choice Tests, Objective Tests

Developing a Valid and Reliable Portfolio Assessment in the Primary Grades: Building on Practical Experience.

Peer reviewed

Shapley, Kelly S.; Bush, M. Joan – Applied Measurement in Education, 1999

Examined the validity and reliability of the 1995-96 reading/language arts portfolio assessment developed in the Dallas (Texas) public schools for prekindergarten through second grade. Ratings by 42 teachers show that portfolio contents do not provide a valid sample of student work and the assessment reliability is low. (SLD)

Descriptors: Language Arts, Portfolio Assessment, Portfolios (Background Materials), Primary Education

The Relationship between the Distribution of Item Difficulties and Test Reliability.

Peer reviewed

Feldt, Leonard S. – Applied Measurement in Education, 1993

The recommendation that the reliability of multiple-choice tests will be enhanced if the distribution of item difficulties is concentrated at approximately 0.50 is reinforced and extended in this article by viewing the 0/1 item scoring as a dichotomization of an underlying normally distributed ability score. (SLD)

Descriptors: Ability, Difficulty Level, Guessing (Tests), Mathematical Models

Previous Page | Next Page »

Pages: 1 | 2

Feldt, Leonard S.	3
Mehrens, William A.	2
Bandalos, Deborah L.	1
Bush, M. Joan	1
Coffman, Don D.	1
Downing, Steven M.	1
Dunbar, Stephen B.	1
Enders, Craig K.	1
Fitzpatrick, Anne R.	1
Frary, Robert B.	1
Haladyna, Thomas M.	1
Hogan, Thomas P.	1
Lane, Suzanne	1
Millman, Jason	1
Murphy, Gavin	1
Myford, Carol M.	1
Pastor, Dena A.	1
Popham, W. James	1
Schiel, Jeffrey L.	1
Schlackman, Jonah	1
Shapley, Kelly S.	1
Shaw, Dale G.	1
Stone, Clement A.	1
Sugrue, Brenda	1
More ▼