ERIC - Search Results

Publication Date

In 2025	0
Since 2024	0
Since 2021 (last 5 years)	2
Since 2016 (last 10 years)	4
Since 2006 (last 20 years)	9

Descriptor

Classification	15
Cutting Scores	15
Test Items	15
Item Response Theory	8
Achievement Tests	4
Comparative Analysis	4
Computer Assisted Testing	4
Evaluation Methods	4
Item Analysis	4
Simulation	4
Accuracy	3
Difficulty Level	3
Foreign Countries	3
Identification	3
Selection	3
Test Construction	3
Ability	2
Criteria	2
High Stakes Tests	2
Mathematics Tests	2
Nonparametric Statistics	2
Psychometrics	2
Scores	2
Standard Setting (Scoring)	2
Test Format	2
More ▼

Source

Applied Measurement in…	3
Educational and Psychological…	3
ProQuest LLC	2
Eurasian Journal of…	1
Language Testing	1
Online Submission	1
Research in the Schools	1

Publication Type

Journal Articles	9
Reports - Evaluative	6
Reports - Research	6
Dissertations/Theses -…	2
Speeches/Meeting Papers	2
Reports - Descriptive	1

Education Level

Secondary Education	3
Elementary Education	1
Elementary Secondary Education	1
Grade 8	1
High Schools	1
Higher Education	1
Junior High Schools	1
Middle Schools	1
Postsecondary Education	1

Audience

Location

Arkansas	1
Massachusetts	1
Minnesota	1
Ohio	1
Turkey	1
Turkey (Ankara)	1

Laws, Policies, & Programs

No Child Left Behind Act 2001

Assessments and Surveys

What Works Clearinghouse Rating

Showing all 15 results Save | Export

Reliability and Validity Evidence of Diagnostic Methods: Comparison of Diagnostic Classification Models and Item Response Theory-Based Methods

Direct link

Yoo Jeong Jang – ProQuest LLC, 2022

Despite the increasing demand for diagnostic information, observed subscores have been often reported to lack adequate psychometric qualities such as reliability, distinctiveness, and validity. Therefore, several statistical techniques based on CTT and IRT frameworks have been proposed to improve the quality of subscores. More recently, DCM has…

Descriptors: Classification, Accuracy, Item Response Theory, Correlation

Investigating the Classification Accuracy of Rasch and Nominal Weights Mean Equating with Very Small Samples

Peer reviewed

Direct link

Furter, Robert T.; Dwyer, Andrew C. – Applied Measurement in Education, 2020

Maintaining equivalent performance standards across forms is a psychometric challenge exacerbated by small samples. In this study, the accuracy of two equating methods (Rasch anchored calibration and nominal weights mean) and four anchor item selection methods were investigated in the context of very small samples (N = 10). Overall, nominal…

Descriptors: Classification, Accuracy, Item Response Theory, Equated Scores

A Comparison of Two Standard-Setting Methods for Tests Consisting of Constructed-Response Items

Peer reviewed
PDF on ERIC

Download full text

Ozarkan, Hatun Betul; Dogan, Celal Deha – Eurasian Journal of Educational Research, 2020

Purpose: This study aimed to compare the cut scores obtained by the Extended Angoff and Contrasting Groups methods for an achievement test consisting of constructed-response items. Research Methods: This study was based on survey research design. In the collection of data, the study group of the research consisted of eight mathematics teachers for…

Descriptors: Standard Setting (Scoring), Responses, Test Items, Cutting Scores

IRT-Based Classification Analysis of an English Language Reading Proficiency Subtest

Peer reviewed

Direct link

Kaya, Elif; O'Grady, Stefan; Kalender, Ilker – Language Testing, 2022

Language proficiency testing serves an important function of classifying examinees into different categories of ability. However, misclassification is to some extent inevitable and may have important consequences for stakeholders. Recent research suggests that classification efficacy may be enhanced substantially using computerized adaptive…

Descriptors: Item Response Theory, Test Items, Language Tests, Classification

The Relationship between State High School Exit Exams and Mathematical Proficiency: Analyses of the Complexity, Content, and Format of Items and Assessment Protocols

Direct link

Regan, Blake B. – ProQuest LLC, 2012

This study examined the relationship between high school exit exams and mathematical proficiency. With the No Child Left Behind (NCLB) Act requiring all students to be proficient in mathematics by 2014, it is imperative that high-stakes assessments accurately evaluate all aspects of student achievement, appropriately set the yardstick by which…

Descriptors: Exit Examinations, Mathematics Achievement, High School Students, Test Items

Polytomous Adaptive Classification Testing: Effects of Item Pool Size, Test Termination Criterion, and Number of Cutscores

Peer reviewed

Direct link

Gnambs, Timo; Batinic, Bernad – Educational and Psychological Measurement, 2011

Computer-adaptive classification tests focus on classifying respondents in different proficiency groups (e.g., for pass/fail decisions). To date, adaptive classification testing has been dominated by research on dichotomous response formats and classifications in two groups. This article extends this line of research to polytomous classification…

Descriptors: Test Length, Computer Assisted Testing, Classification, Test Items

Computerized Classification Testing under the Generalized Graded Unfolding Model

Peer reviewed

Direct link

Wang, Wen-Chung; Liu, Chen-Wei – Educational and Psychological Measurement, 2011

The generalized graded unfolding model (GGUM) has been recently developed to describe item responses to Likert items (agree-disagree) in attitude measurement. In this study, the authors (a) developed two item selection methods in computerized classification testing under the GGUM, the current estimate/ability confidence interval method and the cut…

Descriptors: Computer Assisted Testing, Adaptive Testing, Classification, Item Response Theory

Computerized Classification Testing under the One-Parameter Logistic Response Model with Ability-Based Guessing

Peer reviewed

Direct link

Wang, Wen-Chung; Huang, Sheng-Yun – Educational and Psychological Measurement, 2011

The one-parameter logistic model with ability-based guessing (1PL-AG) has been recently developed to account for effect of ability on guessing behavior in multiple-choice items. In this study, the authors developed algorithms for computerized classification testing under the 1PL-AG and conducted a series of simulations to evaluate their…

Descriptors: Computer Assisted Testing, Classification, Item Analysis, Probability

Achievement Testing in the No Child Left Behind Era: The Arkansas Benchmark

Peer reviewed

Direct link

Hall, John D.; Howerton, D. Lynn; Jones, Craig H. – Research in the Schools, 2008

The No Child Left Behind Act and the accountability movement in public education caused many states to develop criterion-referenced academic achievement tests. Scores from these tests are often used to make high stakes decisions. Even so, these tests typically do not receive independent psychometric scrutiny. We evaluated the 2005 Arkansas…

Descriptors: Criterion Referenced Tests, Achievement Tests, High Stakes Tests, Public Education

Effects of Item-Selection Criteria on Classification Testing with the Sequential Probability Ratio Test. ACT Research Report Series.

Download full text

Lin, Chuan-Ju; Spray, Judith – 2000

This paper presents comparisons among three item-selection criteria for the sequential probability ratio test. The criteria were compared in terms of their efficiency in selecting items, as indicated by average test length and the percentage of correct decisions. The item-selection criteria applied in this study were the Fisher information…

Descriptors: Classification, Criteria, Cutting Scores, Selection

Differential Item Functioning for a Test with a Cutoff Score: Use of Limited Closed-Interval Measures.

Peer reviewed

Oshima, T. C.; And Others – Applied Measurement in Education, 1994

A procedure to detect differential item functioning (DIF) is introduced that is suitable for tests with a cutoff score. DIF is assessed on a limited closed interval of thetas in which a cutoff score falls. How this approach affects the identification of DIF items is demonstrated with real data sets. (SLD)

Descriptors: Ability, Classification, Cutting Scores, Identification

Nonparametric Person-Fit Research: Some Theoretical Issues and an Empirical Example.

Peer reviewed

Meijer, Rob R.; And Others – Applied Measurement in Education, 1996

Several existing group-based statistics to detect improbable item score patterns are discussed, along with the cut scores proposed in the literature to classify an item score pattern as aberrant. A simulation study and an empirical study are used to compare the statistics and their use and to investigate the practical use of cut scores. (SLD)

Descriptors: Achievement Tests, Classification, Cutting Scores, Identification

Evaluation of Linking Methods for Placing Three-Parameter Logistic Item Parameter Estimates onto a One-Parameter Scale

Download full text

Karkee, Thakur B.; Wright, Karen R. – Online Submission, 2004

Different item response theory (IRT) models may be employed for item calibration. Change of testing vendors, for example, may result in the adoption of a different model than that previously used with a testing program. To provide scale continuity and preserve cut score integrity, item parameter estimates from the new model must be linked to the…

Descriptors: Measures (Individuals), Evaluation Criteria, Testing, Integrity

Establishing a Mantel-Haenszel Alpha Cutscore through a Multiple Method Procedure.

Download full text

Sykes, Robert C.; Fitzpatrick, Anne R. – 1990

The results of classifying test items on the basis of their Mantel-Haenszel (MH) alpha estimates were compared to the results of classifying these items using an item response theory (IRT) based procedure involving the comparison of item difficulties in the interest of identifying the alpha value that maximized the decision concordance between the…

Descriptors: Classification, Cutting Scores, Difficulty Level, Ethnic Groups

Nonparametric and Group-Based Person-Fit Statistics: A Validity Study and an Empirical Example. Research Report 94-12.

Download full text

Meijer, Rob R. – 1994

In person-fit analysis, the object is to investigate whether an item score pattern is improbable given the item score patterns of the other persons in the group or given what is expected on the basis of a test model. In this study, several existing group-based statistics to detect such improbable score patterns were investigated, along with the…

Descriptors: Achievement Tests, Classification, College Students, Cutting Scores

Meijer, Rob R.	2
Wang, Wen-Chung	2
Batinic, Bernad	1
Dogan, Celal Deha	1
Dwyer, Andrew C.	1
Fitzpatrick, Anne R.	1
Furter, Robert T.	1
Gnambs, Timo	1
Hall, John D.	1
Howerton, D. Lynn	1
Huang, Sheng-Yun	1
Jones, Craig H.	1
Kalender, Ilker	1
Karkee, Thakur B.	1
Kaya, Elif	1
Lin, Chuan-Ju	1
Liu, Chen-Wei	1
O'Grady, Stefan	1
Oshima, T. C.	1
Ozarkan, Hatun Betul	1
Regan, Blake B.	1
Spray, Judith	1
Sykes, Robert C.	1
Wright, Karen R.	1
More ▼