Publication Date
| In 2026 | 0 |
| Since 2025 | 0 |
| Since 2022 (last 5 years) | 10 |
| Since 2017 (last 10 years) | 17 |
| Since 2007 (last 20 years) | 40 |
Descriptor
| Classification | 42 |
| Comparative Analysis | 42 |
| Test Items | 42 |
| Item Response Theory | 18 |
| Foreign Countries | 16 |
| Item Analysis | 14 |
| Accuracy | 13 |
| Diagnostic Tests | 9 |
| Correlation | 8 |
| Difficulty Level | 8 |
| Simulation | 8 |
| More ▼ | |
Source
Author
Publication Type
| Journal Articles | 33 |
| Reports - Research | 25 |
| Reports - Evaluative | 7 |
| Dissertations/Theses -… | 5 |
| Reports - Descriptive | 4 |
| Speeches/Meeting Papers | 3 |
| Information Analyses | 1 |
| Opinion Papers | 1 |
| Tests/Questionnaires | 1 |
Education Level
| Higher Education | 9 |
| Postsecondary Education | 8 |
| Secondary Education | 6 |
| Elementary Education | 4 |
| Elementary Secondary Education | 4 |
| Middle Schools | 3 |
| Grade 8 | 2 |
| High Schools | 2 |
| Junior High Schools | 2 |
| Grade 6 | 1 |
| Grade 7 | 1 |
| More ▼ | |
Audience
Location
| Turkey | 3 |
| China | 2 |
| California | 1 |
| China (Shanghai) | 1 |
| Czech Republic | 1 |
| Florida | 1 |
| Greece | 1 |
| India | 1 |
| Israel | 1 |
| Massachusetts | 1 |
| New York | 1 |
| More ▼ | |
Laws, Policies, & Programs
| No Child Left Behind Act 2001 | 1 |
Assessments and Surveys
| Program for International… | 2 |
| Eysenck Personality Inventory | 1 |
| Minnesota Multiphasic… | 1 |
| Trends in International… | 1 |
| United States Medical… | 1 |
| Wechsler Adult Intelligence… | 1 |
| Work Keys (ACT) | 1 |
What Works Clearinghouse Rating
Sedat Sen; Allan S. Cohen – Educational and Psychological Measurement, 2024
A Monte Carlo simulation study was conducted to compare fit indices used for detecting the correct latent class in three dichotomous mixture item response theory (IRT) models. Ten indices were considered: Akaike's information criterion (AIC), the corrected AIC (AICc), Bayesian information criterion (BIC), consistent AIC (CAIC), Draper's…
Descriptors: Goodness of Fit, Item Response Theory, Sample Size, Classification
Huang, Hung-Yu – Educational and Psychological Measurement, 2023
The forced-choice (FC) item formats used for noncognitive tests typically develop a set of response options that measure different traits and instruct respondents to make judgments among these options in terms of their preference to control the response biases that are commonly observed in normative tests. Diagnostic classification models (DCMs)…
Descriptors: Test Items, Classification, Bayesian Statistics, Decision Making
Yoo Jeong Jang – ProQuest LLC, 2022
Despite the increasing demand for diagnostic information, observed subscores have been often reported to lack adequate psychometric qualities such as reliability, distinctiveness, and validity. Therefore, several statistical techniques based on CTT and IRT frameworks have been proposed to improve the quality of subscores. More recently, DCM has…
Descriptors: Classification, Accuracy, Item Response Theory, Correlation
Comparison of Traditional Essay Questions versus Case Based Modified Essay Questions in Biochemistry
Bansal, Aastha; Dubey, Abhishek; Singh, Vijay Kumar; Goswami, Binita; Kaushik, Smita – Biochemistry and Molecular Biology Education, 2023
Adult learning involves the analysis and synthesis of knowledge to become competent, which cannot be assessed only by traditional assessment tool and didactic learning methods. Stimulation of higher domains of cognitive learning needs to be inculcated to reach a better understanding of the subject rather than traditional assessment tools that…
Descriptors: Biochemistry, Science Instruction, Alternative Assessment, Microbiology
Aleyna Altan; Zehra Taspinar Sener – Online Submission, 2023
This research aimed to develop a valid and reliable test to be used to detect sixth grade students' misconceptions and errors regarding the subject of fractions. A misconception diagnostic test has been developed that includes the concept of fractions, different representations of fractions, ordering and comparing fractions, equivalence of…
Descriptors: Diagnostic Tests, Mathematics Tests, Fractions, Misconceptions
Abulela, Mohammed A. A.; Rios, Joseph A. – Applied Measurement in Education, 2022
When there are no personal consequences associated with test performance for examinees, rapid guessing (RG) is a concern and can differ between subgroups. To date, the impact of differential RG on item-level measurement invariance has received minimal attention. To that end, a simulation study was conducted to examine the robustness of the…
Descriptors: Comparative Analysis, Robustness (Statistics), Nonparametric Statistics, Item Analysis
Wanxue Zhang; Lingling Meng; Bilan Liang – Interactive Learning Environments, 2023
With the continuous development of education, personalized learning has attracted great attention. How to evaluate students' learning effects has become increasingly important. In information technology courses, the traditional academic evaluation focuses on the student's learning outcomes, such as "scores" or "right/wrong,"…
Descriptors: Information Technology, Computer Science Education, High School Students, Scoring
von Davier, Matthias; Tyack, Lillian; Khorramdel, Lale – Educational and Psychological Measurement, 2023
Automated scoring of free drawings or images as responses has yet to be used in large-scale assessments of student achievement. In this study, we propose artificial neural networks to classify these types of graphical responses from a TIMSS 2019 item. We are comparing classification accuracy of convolutional and feed-forward approaches. Our…
Descriptors: Scoring, Networks, Artificial Intelligence, Elementary Secondary Education
Min, Shangchao; Cai, Hongwen; He, Lianzhen – Language Assessment Quarterly, 2022
The present study examined the performance of the bi-factor multidimensional item response theory (MIRT) model and higher-order (HO) cognitive diagnostic models (CDM) in providing diagnostic information and general ability estimation simultaneously in a listening test. The data used were 1,611 examinees' item-level responses to an in-house EFL…
Descriptors: Listening Comprehension Tests, English (Second Language), Second Language Learning, Foreign Countries
Ozarkan, Hatun Betul; Dogan, Celal Deha – Eurasian Journal of Educational Research, 2020
Purpose: This study aimed to compare the cut scores obtained by the Extended Angoff and Contrasting Groups methods for an achievement test consisting of constructed-response items. Research Methods: This study was based on survey research design. In the collection of data, the study group of the research consisted of eight mathematics teachers for…
Descriptors: Standard Setting (Scoring), Responses, Test Items, Cutting Scores
Malec, Wojciech; Krzeminska-Adamek, Malgorzata – Practical Assessment, Research & Evaluation, 2020
The main objective of the article is to compare several methods of evaluating multiple-choice options through classical item analysis. The methods subjected to examination include the tabulation of choice distribution, the interpretation of trace lines, the point-biserial correlation, the categorical analysis of trace lines, and the investigation…
Descriptors: Comparative Analysis, Evaluation Methods, Multiple Choice Tests, Item Analysis
Liu, Ren; Huggins-Manley, Anne Corinne; Bradshaw, Laine – Educational and Psychological Measurement, 2017
There is an increasing demand for assessments that can provide more fine-grained information about examinees. In response to the demand, diagnostic measurement provides students with feedback on their strengths and weaknesses on specific skills by classifying them into mastery or nonmastery attribute categories. These attributes often form a…
Descriptors: Matrices, Classification, Accuracy, Diagnostic Tests
Yunxiao Chen; Xiaoou Li; Jingchen Liu; Gongjun Xu; Zhiliang Ying – Grantee Submission, 2017
Large-scale assessments are supported by a large item pool. An important task in test development is to assign items into scales that measure different characteristics of individuals, and a popular approach is cluster analysis of items. Classical methods in cluster analysis, such as the hierarchical clustering, K-means method, and latent-class…
Descriptors: Item Analysis, Classification, Graphs, Test Items
Kaya, Elif; O'Grady, Stefan; Kalender, Ilker – Language Testing, 2022
Language proficiency testing serves an important function of classifying examinees into different categories of ability. However, misclassification is to some extent inevitable and may have important consequences for stakeholders. Recent research suggests that classification efficacy may be enhanced substantially using computerized adaptive…
Descriptors: Item Response Theory, Test Items, Language Tests, Classification
Clauser, Jerome C.; Hambleton, Ronald K.; Baldwin, Peter – Educational and Psychological Measurement, 2017
The Angoff standard setting method relies on content experts to review exam items and make judgments about the performance of the minimally proficient examinee. Unfortunately, at times content experts may have gaps in their understanding of specific exam content. These gaps are particularly likely to occur when the content domain is broad and/or…
Descriptors: Scores, Item Analysis, Classification, Decision Making

Peer reviewed
Direct link
