Publication Date
In 2025 | 0 |
Since 2024 | 0 |
Since 2021 (last 5 years) | 0 |
Since 2016 (last 10 years) | 1 |
Since 2006 (last 20 years) | 3 |
Descriptor
Classification | 14 |
Test Items | 9 |
Test Construction | 6 |
Item Response Theory | 4 |
Scores | 4 |
Academic Achievement | 3 |
Cutting Scores | 3 |
Item Bias | 3 |
Measurement | 3 |
Multiple Choice Tests | 3 |
Ability | 2 |
More ▼ |
Source
Applied Measurement in… | 14 |
Author
Downing, Steven M. | 2 |
Haladyna, Thomas M. | 2 |
Alvarez, Karina | 1 |
Barton, Karen E. | 1 |
Byrne, Barbara M. | 1 |
Chen, Wen-Hung | 1 |
Clark, Amy K. | 1 |
Ferrara, Steve | 1 |
Gallant, Dorinda | 1 |
Gierl, Mark J. | 1 |
Huff, Kristen | 1 |
More ▼ |
Publication Type
Journal Articles | 14 |
Reports - Evaluative | 14 |
Information Analyses | 2 |
Education Level
Elementary Education | 1 |
Grade 2 | 1 |
Grade 3 | 1 |
Grade 4 | 1 |
High Schools | 1 |
Secondary Education | 1 |
Audience
Location
South Carolina | 1 |
Laws, Policies, & Programs
Assessments and Surveys
Advanced Placement… | 1 |
What Works Clearinghouse Rating
Thompson, W. Jake; Clark, Amy K.; Nash, Brooke – Applied Measurement in Education, 2019
As the use of diagnostic assessment systems transitions from research applications to large-scale assessments for accountability purposes, reliability methods that provide evidence at each level of reporting are needed. The purpose of this paper is to summarize one simulation-based method for estimating and reporting reliability for an…
Descriptors: Test Reliability, Diagnostic Tests, Classification, Computation
Plake, Barbara S.; Huff, Kristen; Reshetar, Rosemary – Applied Measurement in Education, 2010
In many large-scale assessment programs, achievement level descriptors (ALDs) provide a critical role in communicating what scores on the assessment mean and in interpreting what examinees know and are able to do based on their test performance. Based on their test performance, examinees are often classified into performance categories. The…
Descriptors: Evidence, Test Construction, Measurement, Standard Setting
Penfield, Randall D.; Alvarez, Karina; Lee, Okhee – Applied Measurement in Education, 2009
The assessment of differential item functioning (DIF) in polytomous items addresses between-group differences in measurement properties at the item level, but typically does not inform which score levels may be involved in the DIF effect. The framework of differential step functioning (DSF) addresses this issue by examining between-group…
Descriptors: Test Bias, Classification, Test Items, Criteria

Jodoin, Michael G.; Gierl, Mark J. – Applied Measurement in Education, 2001
Developed a new classification method for the logistic regression (LR) procedure for differential item functioning (DIF) based on methods used in the Simultaneous Item Bias test and conducted a simulation study to determine if the effect size measure affects the Type I error and power rates for the LR DIF procedure. Results show that inclusion of…
Descriptors: Classification, Effect Size, Item Bias, Power (Statistics)

Haladyna, Thomas M.; Downing, Steven M. – Applied Measurement in Education, 1989
Results of 96 theoretical/empirical studies were reviewed to see if they support a taxonomy of 43 rules for writing multiple-choice test items. The taxonomy is the result of an analysis of 46 textbooks dealing with multiple-choice item writing. For nearly half of the rules, no research was found. (SLD)
Descriptors: Classification, Literature Reviews, Multiple Choice Tests, Test Construction

Byrne, Barbara M. – Applied Measurement in Education, 1990
Methodological procedures used in validating the theoretical structure of academic self-concept and validating associated measurement instruments are reviewed. Substantive findings from research related to modes of inquiry are summarized, and recommendations for future research are outlined. (TJH)
Descriptors: Classification, Construct Validity, Evaluation Methods, Literature Reviews
Huynh, Huynh; Barton, Karen E.; Meyer, J. Patrick; Porchea, Sameano; Gallant, Dorinda – Applied Measurement in Education, 2005
This article reports on the consistency of the achievement-level classifications (below basic, basic, proficient, and advanced) established in 1999 for the South Carolina Palmetto Achievement Challenge Tests (PACT; Huynh, Meyer, & Barton, 2000) of English language arts and mathematics. It also utilizes the PACT longitudinal data files of…
Descriptors: Student Records, Language Arts, Accountability, Academic Achievement

Oshima, T. C.; And Others – Applied Measurement in Education, 1994
A procedure to detect differential item functioning (DIF) is introduced that is suitable for tests with a cutoff score. DIF is assessed on a limited closed interval of thetas in which a cutoff score falls. How this approach affects the identification of DIF items is demonstrated with real data sets. (SLD)
Descriptors: Ability, Classification, Cutting Scores, Identification

Tatsuoka, Kikumi – Applied Measurement in Education, 1996
Application of person-fit statistics to cognitive diagnosis requires special efforts to detect normal and usual response patterns resulting from sources of misconception that are frequently observed among students. This study shows a solution for the problem by introducing an extension of a person-fit statistic developed by K. Tatsuoka (1985).…
Descriptors: Classification, Cognitive Tests, Diagnostic Tests, Item Response Theory

Haladyna, Thomas M.; Downing, Steven M. – Applied Measurement in Education, 1989
A taxonomy of 43 rules for writing multiple-choice test items is presented, based on a consensus of 46 textbooks. These guidelines are presented as complete and authoritative, with solid consensus apparent for 33 of the rules. Four rules lack consensus, and 5 rules were cited fewer than 10 times. (SLD)
Descriptors: Classification, Interrater Reliability, Multiple Choice Tests, Objective Tests

Meijer, Rob R.; And Others – Applied Measurement in Education, 1996
Several existing group-based statistics to detect improbable item score patterns are discussed, along with the cut scores proposed in the literature to classify an item score pattern as aberrant. A simulation study and an empirical study are used to compare the statistics and their use and to investigate the practical use of cut scores. (SLD)
Descriptors: Achievement Tests, Classification, Cutting Scores, Identification
Ferrara, Steve; Johnson, Eugene; Chen, Wen-Hung – Applied Measurement in Education, 2005
Psychometricians continue to develop and evaluate methods for linking test scores, both horizontally and vertically. This article describes a social moderation process for articulating (i.e., linking) performance standards across grade levels for an operational state assessment program. The researchers used generated data to evaluate the likely…
Descriptors: Grade 2, Grade 3, Scores, Error of Measurement

Schwarz, Richard D. – Applied Measurement in Education, 1998
Referral, placement, and retention decisions were analyzed using item response theory (IRT) to study whether classification decisions could be placed on the latent continuum of ability normally associated with test items and to study the existence of classification differential item functioning. Results with 352 kindergarten children demonstrate…
Descriptors: Ability, Classification, Decision Making, Grade Repetition

Kennedy, Peter; Walstad, William B. – Applied Measurement in Education, 1997
The consequences in terms of misclassifications of students that would occur by replacing the constructed-response portion of the Advanced Placement (AP) examinations in economics with more multiple-choice items were studied. The 1991 AP examinations in micro- and macroeconomics were used. Computer simulation found that a small but statistically…
Descriptors: Classification, College Entrance Examinations, Computer Simulation, Constructed Response