ERIC - Search Results

Publication Date

In 2025	0
Since 2024	0
Since 2021 (last 5 years)	0
Since 2016 (last 10 years)	1
Since 2006 (last 20 years)	3

Source

Applied Measurement in…

Publication Type

Journal Articles	14
Reports - Evaluative	14
Information Analyses	2

Education Level

Elementary Education	1
Grade 2	1
Grade 3	1
Grade 4	1
High Schools	1
Secondary Education	1

Audience

Location

South Carolina

Laws, Policies, & Programs

Assessments and Surveys

Advanced Placement…

What Works Clearinghouse Rating

Showing all 14 results Save | Export

Measuring the Reliability of Diagnostic Mastery Classifications at Multiple Levels of Reporting

Peer reviewed

Direct link

Thompson, W. Jake; Clark, Amy K.; Nash, Brooke – Applied Measurement in Education, 2019

As the use of diagnostic assessment systems transitions from research applications to large-scale assessments for accountability purposes, reliability methods that provide evidence at each level of reporting are needed. The purpose of this paper is to summarize one simulation-based method for estimating and reporting reliability for an…

Descriptors: Test Reliability, Diagnostic Tests, Classification, Computation

Evidence-Centered Assessment Design as a Foundation for Achievement-Level Descriptor Development and for Standard Setting

Peer reviewed

Direct link

Plake, Barbara S.; Huff, Kristen; Reshetar, Rosemary – Applied Measurement in Education, 2010

In many large-scale assessment programs, achievement level descriptors (ALDs) provide a critical role in communicating what scores on the assessment mean and in interpreting what examinees know and are able to do based on their test performance. Based on their test performance, examinees are often classified into performance categories. The…

Descriptors: Evidence, Test Construction, Measurement, Standard Setting

Using a Taxonomy of Differential Step Functioning to Improve the Interpretation of DIF in Polytomous Items: An Illustration

Peer reviewed

Direct link

Penfield, Randall D.; Alvarez, Karina; Lee, Okhee – Applied Measurement in Education, 2009

The assessment of differential item functioning (DIF) in polytomous items addresses between-group differences in measurement properties at the item level, but typically does not inform which score levels may be involved in the DIF effect. The framework of differential step functioning (DSF) addresses this issue by examining between-group…

Descriptors: Test Bias, Classification, Test Items, Criteria

Evaluating Type I Error and Power Rates Using an Effect Size Measure with the Logistic Regression Procedure for DIF Detection.

Peer reviewed

Jodoin, Michael G.; Gierl, Mark J. – Applied Measurement in Education, 2001

Developed a new classification method for the logistic regression (LR) procedure for differential item functioning (DIF) based on methods used in the Simultaneous Item Bias test and conducted a simulation study to determine if the effect size measure affects the Type I error and power rates for the LR DIF procedure. Results show that inclusion of…

Descriptors: Classification, Effect Size, Item Bias, Power (Statistics)

Validity of a Taxonomy of Multiple-Choice Item-Writing Rules.

Peer reviewed

Haladyna, Thomas M.; Downing, Steven M. – Applied Measurement in Education, 1989

Results of 96 theoretical/empirical studies were reviewed to see if they support a taxonomy of 43 rules for writing multiple-choice test items. The taxonomy is the result of an analysis of 46 textbooks dealing with multiple-choice item writing. For nearly half of the rules, no research was found. (SLD)

Descriptors: Classification, Literature Reviews, Multiple Choice Tests, Test Construction

Methodological Approaches to the Validation of Academic Self-Concept: The Construct and Its Measures.

Peer reviewed

Byrne, Barbara M. – Applied Measurement in Education, 1990

Methodological procedures used in validating the theoretical structure of academic self-concept and validating associated measurement instruments are reviewed. Substantive findings from research related to modes of inquiry are summarized, and recommendations for future research are outlined. (TJH)

Descriptors: Classification, Construct Validity, Evaluation Methods, Literature Reviews

Consistency and Predictive Nature of Vertically Moderated Standards for South Carolina's 1999 Palmetto Achievement Challenge Tests of Language Arts and Mathematics

Peer reviewed

Direct link

Huynh, Huynh; Barton, Karen E.; Meyer, J. Patrick; Porchea, Sameano; Gallant, Dorinda – Applied Measurement in Education, 2005

This article reports on the consistency of the achievement-level classifications (below basic, basic, proficient, and advanced) established in 1999 for the South Carolina Palmetto Achievement Challenge Tests (PACT; Huynh, Meyer, & Barton, 2000) of English language arts and mathematics. It also utilizes the PACT longitudinal data files of…

Descriptors: Student Records, Language Arts, Accountability, Academic Achievement

Differential Item Functioning for a Test with a Cutoff Score: Use of Limited Closed-Interval Measures.

Peer reviewed

Oshima, T. C.; And Others – Applied Measurement in Education, 1994

A procedure to detect differential item functioning (DIF) is introduced that is suitable for tests with a cutoff score. DIF is assessed on a limited closed interval of thetas in which a cutoff score falls. How this approach affects the identification of DIF items is demonstrated with real data sets. (SLD)

Descriptors: Ability, Classification, Cutting Scores, Identification

Use of Generalized Person-Fit Indexes, Zetas for Statistical Pattern Classification.

Peer reviewed

Tatsuoka, Kikumi – Applied Measurement in Education, 1996

Application of person-fit statistics to cognitive diagnosis requires special efforts to detect normal and usual response patterns resulting from sources of misconception that are frequently observed among students. This study shows a solution for the problem by introducing an extension of a person-fit statistic developed by K. Tatsuoka (1985).…

Descriptors: Classification, Cognitive Tests, Diagnostic Tests, Item Response Theory

A Taxonomy of Multiple-Choice Item-Writing Rules.

Peer reviewed

Haladyna, Thomas M.; Downing, Steven M. – Applied Measurement in Education, 1989

A taxonomy of 43 rules for writing multiple-choice test items is presented, based on a consensus of 46 textbooks. These guidelines are presented as complete and authoritative, with solid consensus apparent for 33 of the rules. Four rules lack consensus, and 5 rules were cited fewer than 10 times. (SLD)

Descriptors: Classification, Interrater Reliability, Multiple Choice Tests, Objective Tests

Nonparametric Person-Fit Research: Some Theoretical Issues and an Empirical Example.

Peer reviewed

Meijer, Rob R.; And Others – Applied Measurement in Education, 1996

Several existing group-based statistics to detect improbable item score patterns are discussed, along with the cut scores proposed in the literature to classify an item score pattern as aberrant. A simulation study and an empirical study are used to compare the statistics and their use and to investigate the practical use of cut scores. (SLD)

Descriptors: Achievement Tests, Classification, Cutting Scores, Identification

Vertically Articulated Performance Standards: Logic, Procedures, and Likely Classification Accuracy

Peer reviewed

Direct link

Ferrara, Steve; Johnson, Eugene; Chen, Wen-Hung – Applied Measurement in Education, 2005

Psychometricians continue to develop and evaluate methods for linking test scores, both horizontally and vertically. This article describes a social moderation process for articulating (i.e., linking) performance standards across grade levels for an operational state assessment program. The researchers used generated data to evaluate the likely…

Descriptors: Grade 2, Grade 3, Scores, Error of Measurement

Trace Lines for Classification Decisions.

Peer reviewed

Schwarz, Richard D. – Applied Measurement in Education, 1998

Referral, placement, and retention decisions were analyzed using item response theory (IRT) to study whether classification decisions could be placed on the latent continuum of ability normally associated with test items and to study the existence of classification differential item functioning. Results with 352 kindergarten children demonstrate…

Descriptors: Ability, Classification, Decision Making, Grade Repetition

Combining Multiple-Choice and Constructed-Response Test Scores: An Economist's View.

Peer reviewed

Kennedy, Peter; Walstad, William B. – Applied Measurement in Education, 1997

The consequences in terms of misclassifications of students that would occur by replacing the constructed-response portion of the Advanced Placement (AP) examinations in economics with more multiple-choice items were studied. The 1991 AP examinations in micro- and macroeconomics were used. Computer simulation found that a small but statistically…

Descriptors: Classification, College Entrance Examinations, Computer Simulation, Constructed Response

Classification	14
Test Items	9
Test Construction	6
Item Response Theory	4
Scores	4
Academic Achievement	3
Cutting Scores	3
Item Bias	3
Measurement	3
Multiple Choice Tests	3
Ability	2
Diagnostic Tests	2
Identification	2
Literature Reviews	2
Simulation	2
State Standards	2
Test Format	2
Test Interpretation	2
Academic Standards	1
Accountability	1
Achievement	1
Achievement Tests	1
Advanced Placement Programs	1
Alternative Assessment	1
Cognitive Tests	1
More ▼

Downing, Steven M.	2
Haladyna, Thomas M.	2
Alvarez, Karina	1
Barton, Karen E.	1
Byrne, Barbara M.	1
Chen, Wen-Hung	1
Clark, Amy K.	1
Ferrara, Steve	1
Gallant, Dorinda	1
Gierl, Mark J.	1
Huff, Kristen	1
Huynh, Huynh	1
Jodoin, Michael G.	1
Johnson, Eugene	1
Kennedy, Peter	1
Lee, Okhee	1
Meijer, Rob R.	1
Meyer, J. Patrick	1
Nash, Brooke	1
Oshima, T. C.	1
Penfield, Randall D.	1
Plake, Barbara S.	1
Porchea, Sameano	1
Reshetar, Rosemary	1
More ▼