ERIC - Search Results

Publication Date

In 2025	0
Since 2024	0
Since 2021 (last 5 years)	0
Since 2016 (last 10 years)	1
Since 2006 (last 20 years)	3

Source

Applied Measurement in…

Publication Type

Journal Articles	15
Reports - Evaluative	15
Speeches/Meeting Papers	2

Education Level

Elementary Education	1
Elementary Secondary Education	1
Grade 5	1
Grade 8	1
High Schools	1
Middle Schools	1
Secondary Education	1

Audience

Location

Vermont

Laws, Policies, & Programs

Assessments and Surveys

National Assessment of…

What Works Clearinghouse Rating

Showing all 15 results Save | Export

Measuring the Reliability of Diagnostic Mastery Classifications at Multiple Levels of Reporting

Peer reviewed

Direct link

Thompson, W. Jake; Clark, Amy K.; Nash, Brooke – Applied Measurement in Education, 2019

As the use of diagnostic assessment systems transitions from research applications to large-scale assessments for accountability purposes, reliability methods that provide evidence at each level of reporting are needed. The purpose of this paper is to summarize one simulation-based method for estimating and reporting reliability for an…

Descriptors: Test Reliability, Diagnostic Tests, Classification, Computation

Measurement Properties of Two Innovative Item Formats in a Computer-Based Test

Peer reviewed

Direct link

Wan, Lei; Henly, George A. – Applied Measurement in Education, 2012

Many innovative item formats have been proposed over the past decade, but little empirical research has been conducted on their measurement properties. This study examines the reliability, efficiency, and construct validity of two innovative item formats--the figural response (FR) and constructed response (CR) formats used in a K-12 computerized…

Descriptors: Test Items, Test Format, Computer Assisted Testing, Measurement

Prophesying the Reliability of Cognitively Complex Assessments.

Peer reviewed

Nichols, Paul; Kuehl, Barbara Jean – Applied Measurement in Education, 1999

An approach is presented that can predict internal consistency of cognitively complex assessments on two dimensions, those of adding tasks with similar or different solution strategies and adding test takers with different solution strategies. Data from the 1992 National Assessment of Educational Progress mathematics assessment are used to…

Descriptors: Cognitive Tests, Mathematics Tests, Prediction, Test Reliability

Performance of SIBTEST When the Percentage of DIF Items Is Large

Peer reviewed

Direct link

Gierl, Mark J.; Gotzmann, Andrea; Boughton, Keith A. – Applied Measurement in Education, 2004

Differential item functioning (DIF) analyses are used to identify items that operate differently between two groups, after controlling for ability. The Simultaneous Item Bias Test (SIBTEST) is a popular DIF detection method that matches examinees on a true score estimate of ability. However in some testing situations, like test translation and…

Descriptors: True Scores, Simulation, Test Bias, Student Evaluation

Estimating the Reliability of a Test Containing Multiple Item Formats.

Peer reviewed

Qualls, Audrey L. – Applied Measurement in Education, 1995

Classically parallel, tau-equivalently parallel, and congenerically parallel models representing various degrees of part-test parallelism and their appropriateness for tests composed of multiple item formats are discussed. An appropriate reliability estimate for a test with multiple item formats is presented and illustrated. (SLD)

Descriptors: Achievement Tests, Estimation (Mathematics), Measurement Techniques, Test Format

The Relationship between the Distribution of Item Difficulties and Test Reliability.

Peer reviewed

Feldt, Leonard S. – Applied Measurement in Education, 1993

The recommendation that the reliability of multiple-choice tests will be enhanced if the distribution of item difficulties is concentrated at approximately 0.50 is reinforced and extended in this article by viewing the 0/1 item scoring as a dichotomization of an underlying normally distributed ability score. (SLD)

Descriptors: Ability, Difficulty Level, Guessing (Tests), Mathematical Models

An Investigation of the Differential Effort Received by Items on a Low-Stakes Computer-Based Test

Peer reviewed

Direct link

Wise, Steven L. – Applied Measurement in Education, 2006

In low-stakes testing, the motivation levels of examinees are often a matter of concern to test givers because a lack of examinee effort represents a direct threat to the validity of the test data. This study investigated the use of response time to assess the amount of examinee effort received by individual test items. In 2 studies, it was found…

Descriptors: Computer Assisted Testing, Motivation, Test Validity, Item Response Theory

Combining Multiple-Choice and Constructed-Response Test Scores: Toward a Marxist Theory of Test Construction.

Peer reviewed

Wainer, Howard; Thissen, David – Applied Measurement in Education, 1993

Because assessment instruments of the future may well be composed of a combination of types of questions, a way to combine those scores effectively is discussed. Two new graphic tools are presented that show that it may not be practical to equalize the reliability of different components. (SLD)

Descriptors: Constructed Response, Educational Assessment, Graphs, Item Response Theory

The Reliability of Mathematics Portfolio Scores: Lessons from the Vermont Experience.

Peer reviewed

Klein, Stephen P.; And Others – Applied Measurement in Education, 1995

Portfolios are the centerpiece of Vermont's statewide assessment program in mathematics. Portfolio scores in the first two years were not reliable enough to permit the reporting of student-level results, but increasing the number of readers or the number of portfolio pieces is not operationally feasible. (SLD)

Descriptors: Educational Assessment, Elementary Secondary Education, Mathematics Tests, Performance Based Assessment

Linking Statewide Tests to the National Assessment of Educational Progress: Stability of Results.

Peer reviewed

Linn, Robert L.; Kiplinger, Vonda L. – Applied Measurement in Education, 1995

The adequacy of linking statewide standardized test results to the National Assessment of Educational Progress by using equipercentile equating procedures was investigated using statewide mathematics data from four states. Results suggest that the linkings are not sufficiently trustworthy to make comparisons based on the tails of the distribution.…

Descriptors: Comparative Analysis, Educational Assessment, Equated Scores, Mathematics Tests

Reliability of Credentialing Examinations and the Impact of Scoring Models and Standard-Setting Policies.

Peer reviewed

Hambleton, Ronald K.; Slater, Sharon C. – Applied Measurement in Education, 1997

A brief history of developments in the assessment of the reliability of credentialing examinations is presented, and some new results are outlined that highlight the interactions among scoring, standard setting, and the reliability and validity of pass-fail decisions. Decision consistency is an important concept in evaluating credentialing…

Descriptors: Certification, Credentials, Decision Making, Interaction

Use of Restricted Item Response Theory Models for Examining the Stability of Item Parameter Estimates over Time.

Peer reviewed

Stone, Clement A.; Lane, Suzanne – Applied Measurement in Education, 1991

A model-testing approach for evaluating the stability of item response theory item parameter estimates (IPEs) in a pretest-posttest design is illustrated. Nineteen items from the Head Start Measures Battery were used. A moderately high degree of stability in the IPEs for 5,510 children assessed on 2 occasions was found. (TJH)

Descriptors: Comparative Testing, Compensatory Education, Computer Assisted Testing, Early Childhood Education

Quality Control in the Development and Use of Performance Assessments.

Peer reviewed

Dunbar, Stephen B.; And Others – Applied Measurement in Education, 1991

Issues pertaining to the quality of performance assessments, including reliability and validity, are discussed. The relatively limited generalizability of performance across tasks is indicative of the care needed to evaluate performance assessments. Quality control is an empirical matter when measurement is intended to inform public policy. (SLD)

Descriptors: Educational Assessment, Generalization, Interrater Reliability, Measurement Techniques

Teaching Licensing and the New Assessment Methodologies.

Peer reviewed

Millman, Jason – Applied Measurement in Education, 1991

Alternatives to multiple-choice tests for teacher licensing examinations are described, and their advantages are cited. Concerns are expressed in the areas of cost and practicality, reliability, corruptibility, and validity. A suggestion for reducing costs using multiple-choice responses calibrated to constructed-response tasks is proposed. (SLD)

Descriptors: Beginning Teachers, Constructed Response, Cost Effectiveness, Educational Assessment

How to Evaluate the Legal Defensibility of High-Stakes Tests.

Peer reviewed

Mehrens, William A.; Popham, W. James – Applied Measurement in Education, 1992

This paper discusses how to determine whether a test was developed in a legally defensible manner, reviewing general issues, specific cases bearing on different types of test use, some evaluative dimensions, and evidence of test quality. Tests constructed and used according to existing standards will generally stand legal scrutiny. (SLD)

Descriptors: College Entrance Examinations, Compliance (Legal), Constitutional Law, Court Litigation

Test Reliability	15
Test Construction	6
Test Validity	6
Educational Assessment	5
Item Response Theory	4
Mathematics Tests	4
Multiple Choice Tests	4
Performance Based Assessment	4
Scores	4
Test Items	4
Computer Assisted Testing	3
Scoring	3
Test Use	3
Achievement Tests	2
Constructed Response	2
Decision Making	2
Elementary Secondary Education	2
Estimation (Mathematics)	2
Evaluation Methods	2
Guessing (Tests)	2
Licensing Examinations…	2
Measurement	2
Measurement Techniques	2
State Programs	2
Statistical Distributions	2
More ▼

Boughton, Keith A.	1
Clark, Amy K.	1
Dunbar, Stephen B.	1
Feldt, Leonard S.	1
Gierl, Mark J.	1
Gotzmann, Andrea	1
Hambleton, Ronald K.	1
Henly, George A.	1
Kiplinger, Vonda L.	1
Klein, Stephen P.	1
Kuehl, Barbara Jean	1
Lane, Suzanne	1
Linn, Robert L.	1
Mehrens, William A.	1
Millman, Jason	1
Nash, Brooke	1
Nichols, Paul	1
Popham, W. James	1
Qualls, Audrey L.	1
Slater, Sharon C.	1
Stone, Clement A.	1
Thissen, David	1
Thompson, W. Jake	1
Wainer, Howard	1
More ▼