NotesFAQContact Us
Collection
Advanced
Search Tips
Audience
Laws, Policies, & Programs
No Child Left Behind Act 20011
Assessments and Surveys
What Works Clearinghouse Rating
Showing all 15 results Save | Export
Yoo Jeong Jang – ProQuest LLC, 2022
Despite the increasing demand for diagnostic information, observed subscores have been often reported to lack adequate psychometric qualities such as reliability, distinctiveness, and validity. Therefore, several statistical techniques based on CTT and IRT frameworks have been proposed to improve the quality of subscores. More recently, DCM has…
Descriptors: Classification, Accuracy, Item Response Theory, Correlation
Peer reviewed Peer reviewed
Direct linkDirect link
Furter, Robert T.; Dwyer, Andrew C. – Applied Measurement in Education, 2020
Maintaining equivalent performance standards across forms is a psychometric challenge exacerbated by small samples. In this study, the accuracy of two equating methods (Rasch anchored calibration and nominal weights mean) and four anchor item selection methods were investigated in the context of very small samples (N = 10). Overall, nominal…
Descriptors: Classification, Accuracy, Item Response Theory, Equated Scores
Peer reviewed Peer reviewed
PDF on ERIC Download full text
Ozarkan, Hatun Betul; Dogan, Celal Deha – Eurasian Journal of Educational Research, 2020
Purpose: This study aimed to compare the cut scores obtained by the Extended Angoff and Contrasting Groups methods for an achievement test consisting of constructed-response items. Research Methods: This study was based on survey research design. In the collection of data, the study group of the research consisted of eight mathematics teachers for…
Descriptors: Standard Setting (Scoring), Responses, Test Items, Cutting Scores
Peer reviewed Peer reviewed
Direct linkDirect link
Kaya, Elif; O'Grady, Stefan; Kalender, Ilker – Language Testing, 2022
Language proficiency testing serves an important function of classifying examinees into different categories of ability. However, misclassification is to some extent inevitable and may have important consequences for stakeholders. Recent research suggests that classification efficacy may be enhanced substantially using computerized adaptive…
Descriptors: Item Response Theory, Test Items, Language Tests, Classification
Regan, Blake B. – ProQuest LLC, 2012
This study examined the relationship between high school exit exams and mathematical proficiency. With the No Child Left Behind (NCLB) Act requiring all students to be proficient in mathematics by 2014, it is imperative that high-stakes assessments accurately evaluate all aspects of student achievement, appropriately set the yardstick by which…
Descriptors: Exit Examinations, Mathematics Achievement, High School Students, Test Items
Peer reviewed Peer reviewed
Direct linkDirect link
Gnambs, Timo; Batinic, Bernad – Educational and Psychological Measurement, 2011
Computer-adaptive classification tests focus on classifying respondents in different proficiency groups (e.g., for pass/fail decisions). To date, adaptive classification testing has been dominated by research on dichotomous response formats and classifications in two groups. This article extends this line of research to polytomous classification…
Descriptors: Test Length, Computer Assisted Testing, Classification, Test Items
Peer reviewed Peer reviewed
Direct linkDirect link
Wang, Wen-Chung; Liu, Chen-Wei – Educational and Psychological Measurement, 2011
The generalized graded unfolding model (GGUM) has been recently developed to describe item responses to Likert items (agree-disagree) in attitude measurement. In this study, the authors (a) developed two item selection methods in computerized classification testing under the GGUM, the current estimate/ability confidence interval method and the cut…
Descriptors: Computer Assisted Testing, Adaptive Testing, Classification, Item Response Theory
Peer reviewed Peer reviewed
Direct linkDirect link
Wang, Wen-Chung; Huang, Sheng-Yun – Educational and Psychological Measurement, 2011
The one-parameter logistic model with ability-based guessing (1PL-AG) has been recently developed to account for effect of ability on guessing behavior in multiple-choice items. In this study, the authors developed algorithms for computerized classification testing under the 1PL-AG and conducted a series of simulations to evaluate their…
Descriptors: Computer Assisted Testing, Classification, Item Analysis, Probability
Peer reviewed Peer reviewed
Direct linkDirect link
Hall, John D.; Howerton, D. Lynn; Jones, Craig H. – Research in the Schools, 2008
The No Child Left Behind Act and the accountability movement in public education caused many states to develop criterion-referenced academic achievement tests. Scores from these tests are often used to make high stakes decisions. Even so, these tests typically do not receive independent psychometric scrutiny. We evaluated the 2005 Arkansas…
Descriptors: Criterion Referenced Tests, Achievement Tests, High Stakes Tests, Public Education
Lin, Chuan-Ju; Spray, Judith – 2000
This paper presents comparisons among three item-selection criteria for the sequential probability ratio test. The criteria were compared in terms of their efficiency in selecting items, as indicated by average test length and the percentage of correct decisions. The item-selection criteria applied in this study were the Fisher information…
Descriptors: Classification, Criteria, Cutting Scores, Selection
Peer reviewed Peer reviewed
Oshima, T. C.; And Others – Applied Measurement in Education, 1994
A procedure to detect differential item functioning (DIF) is introduced that is suitable for tests with a cutoff score. DIF is assessed on a limited closed interval of thetas in which a cutoff score falls. How this approach affects the identification of DIF items is demonstrated with real data sets. (SLD)
Descriptors: Ability, Classification, Cutting Scores, Identification
Peer reviewed Peer reviewed
Meijer, Rob R.; And Others – Applied Measurement in Education, 1996
Several existing group-based statistics to detect improbable item score patterns are discussed, along with the cut scores proposed in the literature to classify an item score pattern as aberrant. A simulation study and an empirical study are used to compare the statistics and their use and to investigate the practical use of cut scores. (SLD)
Descriptors: Achievement Tests, Classification, Cutting Scores, Identification
Karkee, Thakur B.; Wright, Karen R. – Online Submission, 2004
Different item response theory (IRT) models may be employed for item calibration. Change of testing vendors, for example, may result in the adoption of a different model than that previously used with a testing program. To provide scale continuity and preserve cut score integrity, item parameter estimates from the new model must be linked to the…
Descriptors: Measures (Individuals), Evaluation Criteria, Testing, Integrity
Sykes, Robert C.; Fitzpatrick, Anne R. – 1990
The results of classifying test items on the basis of their Mantel-Haenszel (MH) alpha estimates were compared to the results of classifying these items using an item response theory (IRT) based procedure involving the comparison of item difficulties in the interest of identifying the alpha value that maximized the decision concordance between the…
Descriptors: Classification, Cutting Scores, Difficulty Level, Ethnic Groups
Meijer, Rob R. – 1994
In person-fit analysis, the object is to investigate whether an item score pattern is improbable given the item score patterns of the other persons in the group or given what is expected on the basis of a test model. In this study, several existing group-based statistics to detect such improbable score patterns were investigated, along with the…
Descriptors: Achievement Tests, Classification, College Students, Cutting Scores