Publication Date
In 2025 | 0 |
Since 2024 | 1 |
Since 2021 (last 5 years) | 1 |
Since 2016 (last 10 years) | 2 |
Since 2006 (last 20 years) | 7 |
Descriptor
Classification | 10 |
Comparative Analysis | 10 |
Test Length | 10 |
Adaptive Testing | 4 |
Computer Assisted Testing | 4 |
Item Response Theory | 4 |
Reliability | 4 |
Test Items | 4 |
Accuracy | 3 |
Cutting Scores | 3 |
Decision Making | 3 |
More ▼ |
Source
Educational and Psychological… | 3 |
ProQuest LLC | 2 |
Educational Research and… | 1 |
Journal of Psychoeducational… | 1 |
Author
Paek, Insu | 2 |
Allan S. Cohen | 1 |
Bradshaw, Laine | 1 |
Deng, Nina | 1 |
Eggen, Theo J. H. M. | 1 |
Frick, Theodore W. | 1 |
Haladyna, Tom | 1 |
Huggins-Manley, Anne Corinne | 1 |
Kim, Jiseon | 1 |
Lee, Jihyun | 1 |
Liu, Ren | 1 |
More ▼ |
Publication Type
Reports - Research | 7 |
Journal Articles | 5 |
Dissertations/Theses -… | 2 |
Speeches/Meeting Papers | 2 |
Reports - Evaluative | 1 |
Education Level
Audience
Researchers | 1 |
Location
Laws, Policies, & Programs
Assessments and Surveys
What Works Clearinghouse Rating
Sedat Sen; Allan S. Cohen – Educational and Psychological Measurement, 2024
A Monte Carlo simulation study was conducted to compare fit indices used for detecting the correct latent class in three dichotomous mixture item response theory (IRT) models. Ten indices were considered: Akaike's information criterion (AIC), the corrected AIC (AICc), Bayesian information criterion (BIC), consistent AIC (CAIC), Draper's…
Descriptors: Goodness of Fit, Item Response Theory, Sample Size, Classification
Liu, Ren; Huggins-Manley, Anne Corinne; Bradshaw, Laine – Educational and Psychological Measurement, 2017
There is an increasing demand for assessments that can provide more fine-grained information about examinees. In response to the demand, diagnostic measurement provides students with feedback on their strengths and weaknesses on specific skills by classifying them into mastery or nonmastery attribute categories. These attributes often form a…
Descriptors: Matrices, Classification, Accuracy, Diagnostic Tests
Lee, Jihyun; Paek, Insu – Journal of Psychoeducational Assessment, 2014
Likert-type rating scales are still the most widely used method when measuring psychoeducational constructs. The present study investigates a long-standing issue of identifying the optimal number of response categories. A special emphasis is given to categorical data, which were generated by the Item Response Theory (IRT) Graded-Response Modeling…
Descriptors: Likert Scales, Responses, Item Response Theory, Classification
Eggen, Theo J. H. M. – Educational Research and Evaluation, 2011
If classification in a limited number of categories is the purpose of testing, computerized adaptive tests (CATs) with algorithms based on sequential statistical testing perform better than estimation-based CATs (e.g., Eggen & Straetmans, 2000). In these computerized classification tests (CCTs), the Sequential Probability Ratio Test (SPRT) (Wald,…
Descriptors: Test Length, Adaptive Testing, Classification, Item Analysis
Paek, Insu; Wilson, Mark – Educational and Psychological Measurement, 2011
This study elaborates the Rasch differential item functioning (DIF) model formulation under the marginal maximum likelihood estimation context. Also, the Rasch DIF model performance was examined and compared with the Mantel-Haenszel (MH) procedure in small sample and short test length conditions through simulations. The theoretically known…
Descriptors: Test Bias, Test Length, Statistical Inference, Geometric Concepts
Deng, Nina – ProQuest LLC, 2011
Three decision consistency and accuracy (DC/DA) methods, the Livingston and Lewis (LL) method, LEE method, and the Hambleton and Han (HH) method, were evaluated. The purposes of the study were: (1) to evaluate the accuracy and robustness of these methods, especially when their assumptions were not well satisfied, (2) to investigate the "true"…
Descriptors: Item Response Theory, Test Theory, Computation, Classification
Kim, Jiseon – ProQuest LLC, 2010
Classification testing has been widely used to make categorical decisions by determining whether an examinee has a certain degree of ability required by established standards. As computer technologies have developed, classification testing has become more computerized. Several approaches have been proposed and investigated in the context of…
Descriptors: Test Length, Computer Assisted Testing, Classification, Probability
Reckase, Mark D. – 1981
This report describes a study comparing the classification results obtained from a one-parameter and three-parameter logistic based tailored testing procedure used in conjunction with Wald's sequential probability ratio test (SPRT). Eighty-eight college students were classified into four grade categories using achievement test results obtained…
Descriptors: Adaptive Testing, Classification, Comparative Analysis, Computer Assisted Testing
Haladyna, Tom; Roid, Gale – 1980
The problems associated with misclassifying students when pass-fail decisions are based on test scores are discussed. One protection against misclassification is to set a confidence interval around the cutting score. Those whose scores fall above the interval are passed; those whose scores fall below the interval are failed; and those whose scores…
Descriptors: Bayesian Statistics, Classification, Comparative Analysis, Criterion Referenced Tests
Frick, Theodore W. – 1986
The sequential probability ratio test (SPRT), developed by Abraham Wald, is one statistical model available for making mastery decisions during computer-based criterion referenced tests. The predictive validity of the SPRT was empirically investigated with two different and relatively large item pools with heterogeneous item parameters. Graduate…
Descriptors: Achievement Tests, Adaptive Testing, Classification, Comparative Analysis