Publication Date
| In 2026 | 0 |
| Since 2025 | 0 |
| Since 2022 (last 5 years) | 4 |
| Since 2017 (last 10 years) | 13 |
| Since 2007 (last 20 years) | 25 |
Descriptor
| Classification | 26 |
| Correlation | 26 |
| Test Items | 26 |
| Difficulty Level | 10 |
| Item Response Theory | 9 |
| Comparative Analysis | 8 |
| Accuracy | 7 |
| Item Analysis | 6 |
| Models | 6 |
| Reliability | 6 |
| Multiple Choice Tests | 5 |
| More ▼ | |
Source
Author
Publication Type
| Journal Articles | 20 |
| Reports - Research | 18 |
| Dissertations/Theses -… | 4 |
| Reports - Evaluative | 2 |
| Information Analyses | 1 |
| Non-Print Media | 1 |
| Numerical/Quantitative Data | 1 |
| Reference Materials - General | 1 |
Education Level
| Higher Education | 8 |
| Postsecondary Education | 7 |
| High Schools | 2 |
| Elementary Education | 1 |
| Junior High Schools | 1 |
| Kindergarten | 1 |
| Middle Schools | 1 |
| Secondary Education | 1 |
Audience
Location
| Israel | 1 |
| New Mexico | 1 |
| Turkey | 1 |
| United Kingdom (England) | 1 |
Laws, Policies, & Programs
Assessments and Surveys
| Dynamic Indicators of Basic… | 1 |
| Minnesota Multiphasic… | 1 |
| National Assessment of… | 1 |
| SAT (College Admission Test) | 1 |
| Wechsler Adult Intelligence… | 1 |
What Works Clearinghouse Rating
The Reliability of the Posterior Probability of Skill Attainment in Diagnostic Classification Models
Johnson, Matthew S.; Sinharay, Sandip – Journal of Educational and Behavioral Statistics, 2020
One common score reported from diagnostic classification assessments is the vector of posterior means of the skill mastery indicators. As with any assessment, it is important to derive and report estimates of the reliability of the reported scores. After reviewing a reliability measure suggested by Templin and Bradshaw, this article suggests three…
Descriptors: Reliability, Probability, Skill Development, Classification
Sedat Sen; Allan S. Cohen – Educational and Psychological Measurement, 2024
A Monte Carlo simulation study was conducted to compare fit indices used for detecting the correct latent class in three dichotomous mixture item response theory (IRT) models. Ten indices were considered: Akaike's information criterion (AIC), the corrected AIC (AICc), Bayesian information criterion (BIC), consistent AIC (CAIC), Draper's…
Descriptors: Goodness of Fit, Item Response Theory, Sample Size, Classification
Alallo, Hajir Mahmood Ibrahim; Mohammed, Aisha; Hamid, Zayad Khalaf; Hassan, Aalaa Yaseen; Kadhim, Qasim Khlaif – International Journal of Language Testing, 2023
Diagnostic classification models (DCMs) have recently become very popular both for research purposes and for real testing endeavors for student assessment. A plethora of DCM models give researchers and practitioners a wide range of options for student diagnosis and classification. One intriguing option that some DCM models offer is the possibility…
Descriptors: Language Tests, Diagnostic Tests, Classification, Clinical Diagnosis
Yoo Jeong Jang – ProQuest LLC, 2022
Despite the increasing demand for diagnostic information, observed subscores have been often reported to lack adequate psychometric qualities such as reliability, distinctiveness, and validity. Therefore, several statistical techniques based on CTT and IRT frameworks have been proposed to improve the quality of subscores. More recently, DCM has…
Descriptors: Classification, Accuracy, Item Response Theory, Correlation
Saatcioglu, Fatima Munevver; Atar, Hakan Yavuz – International Journal of Assessment Tools in Education, 2022
This study aims to examine the effects of mixture item response theory (IRT) models on item parameter estimation and classification accuracy under different conditions. The manipulated variables of the simulation study are set as mixture IRT models (Rasch, 2PL, 3PL); sample size (600, 1000); the number of items (10, 30); the number of latent…
Descriptors: Accuracy, Classification, Item Response Theory, Programming Languages
Aksu Dunya, Beyza – International Journal of Testing, 2018
This study was conducted to analyze potential item parameter drift (IPD) impact on person ability estimates and classification accuracy when drift affects an examinee subgroup. Using a series of simulations, three factors were manipulated: (a) percentage of IPD items in the CAT exam, (b) percentage of examinees affected by IPD, and (c) item pool…
Descriptors: Adaptive Testing, Classification, Accuracy, Computer Assisted Testing
Malec, Wojciech; Krzeminska-Adamek, Malgorzata – Practical Assessment, Research & Evaluation, 2020
The main objective of the article is to compare several methods of evaluating multiple-choice options through classical item analysis. The methods subjected to examination include the tabulation of choice distribution, the interpretation of trace lines, the point-biserial correlation, the categorical analysis of trace lines, and the investigation…
Descriptors: Comparative Analysis, Evaluation Methods, Multiple Choice Tests, Item Analysis
Smith, J. Alexander; Dickinson, John R. – International Journal for Business Education, 2017
Published banks of multiple-choice questions are ubiquitous, the questions in those banks often being classified into levels of difficulty. The specific level of difficulty into which a question is classified might or should be a function of the question's substance. Possibly, though, insubstantive aspects of the question, such as the incidence of…
Descriptors: Correlation, Multiple Choice Tests, Difficulty Level, Classification
Rakes, Christopher R.; Ronau, Robert N. – International Journal of Research in Education and Science, 2019
The present study examined the ability of content domain (algebra, geometry, rational number, probability) to classify mathematics misconceptions. The study was conducted with 1,133 students in 53 algebra and geometry classes taught by 17 teachers from three high schools and one middle school across three school districts in a Midwestern state.…
Descriptors: Mathematics Instruction, Secondary School Teachers, Middle School Teachers, Misconceptions
Demir, Ergul – Eurasian Journal of Educational Research, 2018
Purpose: The answer-copying tendency has the potential to detect suspicious answer patterns for prior distributions of statistical detection techniques. The aim of this study is to develop a valid and reliable measurement tool as a scale in order to observe the tendency of university students' copying of answers. Also, it is aimed to provide…
Descriptors: College Students, Cheating, Test Construction, Student Behavior
Chen, Fu; Zhang, Shanshan; Guo, Yanfang; Xin, Tao – Research in Science Education, 2017
We used the Rule Space Model, a cognitive diagnostic model, to measure the learning progression for thermochemistry for senior high school students. We extracted five attributes and proposed their hierarchical relationships to model the construct of thermochemistry at four levels using a hypothesized learning progression. For this study, we…
Descriptors: Chemistry, High School Students, Secondary School Science, Correlation
Dahlke, Katie; Yang, Rui; Martínez, Carmen; Chavez, Suzette; Martin, Alejandra; Hawkinson, Laura; Shields, Joseph; Garland, Marshall; Carle, Jill – Regional Educational Laboratory Southwest, 2017
The New Mexico Public Education Department developed the Kindergarten Observation Tool (KOT) as a multidimensional observational measure of students' knowledge and skills at kindergarten entry. The primary purpose of the KOT is to inform instruction, so that kindergarten teachers can use the information about their students' knowledge and skills…
Descriptors: Test Validity, Observation, Measures (Individuals), Kindergarten
Jurich, Daniel P.; Bradshaw, Laine P. – International Journal of Testing, 2014
The assessment of higher-education student learning outcomes is an important component in understanding the strengths and weaknesses of academic and general education programs. This study illustrates the application of diagnostic classification models, a burgeoning set of statistical models, in assessing student learning outcomes. To facilitate…
Descriptors: College Outcomes Assessment, Classification, Statistical Analysis, Models
Kim, Sooyeon; Moses, Tim – International Journal of Testing, 2013
The major purpose of this study is to assess the conditions under which single scoring for constructed-response (CR) items is as effective as double scoring in the licensure testing context. We used both empirical datasets of five mixed-format licensure tests collected in actual operational settings and simulated datasets that allowed for the…
Descriptors: Scoring, Test Format, Licensing Examinations (Professions), Test Items
Lesnov, Roman Olegovich – International Journal of Computer-Assisted Language Learning and Teaching, 2018
This article compares second language test-takers' performance on an academic listening test in an audio-only mode versus an audio-video mode. A new method of classifying video-based visuals was developed and piloted, which used L2 expert opinions to place the video on a continuum from being content-deficient (not helpful for answering…
Descriptors: Second Language Learning, Second Language Instruction, Video Technology, Classification
Previous Page | Next Page »
Pages: 1 | 2
Peer reviewed
Direct link
