ERIC - Search Results

Publication Date

In 2025	18
Since 2024	52

Descriptor

Difficulty Level	52
Test Items	52
Foreign Countries	25
Item Response Theory	17
Test Construction	15
Test Validity	15
Test Reliability	13
Item Analysis	12
English (Second Language)	11
Multiple Choice Tests	11
Language Tests	10
Psychometrics	9
Science Tests	9
Second Language Learning	8
Computer Assisted Testing	7
Reading Tests	7
Scores	7
Second Language Instruction	7
Undergraduate Students	7
Accuracy	6
Cognitive Processes	6
Language Proficiency	6
Student Evaluation	6
Artificial Intelligence	5
Reaction Time	5
More ▼

Publication Type

Reports - Research	47
Journal Articles	45
Dissertations/Theses -…	3
Information Analyses	2
Tests/Questionnaires	2
Reports - Evaluative	1
Speeches/Meeting Papers	1

Education Level

Higher Education	16
Postsecondary Education	16
Secondary Education	13
Elementary Education	7
Middle Schools	5
High Schools	4
Intermediate Grades	4
Junior High Schools	4
Early Childhood Education	2
Elementary Secondary Education	2
Grade 2	2
Grade 4	2
Grade 6	2
Primary Education	2
Grade 7	1
Grade 8	1
More ▼

Audience

Location

Indonesia	3
Iran	2
Philippines	2
United Kingdom	2
Bosnia and Herzegovina	1
China	1
Germany	1
Iran (Tehran)	1
Japan	1
New Zealand	1
Nigeria	1
Oman	1
Poland	1
Slovakia	1
Thailand	1
Thailand (Bangkok)	1
Turkey	1
Turkey (Istanbul)	1
United States	1
More ▼

Laws, Policies, & Programs

Assessments and Surveys

Big Five Inventory	1
International English…	1
Progress in International…	1
Remote Associates Test	1
Test of English for…	1
Trends in International…	1
Watson Glaser Critical…	1

What Works Clearinghouse Rating

Showing 1 to 15 of 52 results Save | Export

Embedding Embedded Standard Setting: An Application of Cross-Classified Item Response Theory. CRESST Report 876

Download full text

Yun-Kyung Kim; Li Cai – National Center for Research on Evaluation, Standards, and Student Testing (CRESST), 2025

This paper introduces an application of cross-classified item response theory (IRT) modeling to an assessment utilizing the embedded standard setting (ESS) method (Lewis & Cook). The cross-classified IRT model is used to treat both item and person effects as random, where the item effects are regressed on the target performance levels (target…

Descriptors: Standard Setting (Scoring), Item Response Theory, Test Items, Difficulty Level

Interaction of Social Deference and Cognitive Processing in the Prediction of Acquiescence

Peer reviewed

Direct link

Patrik Havan; Michal Kohút; Peter Halama – International Journal of Testing, 2025

Acquiescence is the tendency of participants to shift their responses to agreement. Lechner et al. (2019) introduced the following mechanisms of acquiescence: social deference and cognitive processing. We added their interaction into a theoretical framework. The sample consists of 557 participants. We found significant medium strong relationship…

Descriptors: Cognitive Processes, Attention, Difficulty Level, Reflection

The Accuracy of Estimating Parameters of Multiple-Choice Test Items, Following Item-Response Theory: A Simulation Study

Peer reviewed
PDF on ERIC

Download full text

Aiman Mohammad Freihat; Omar Saleh Bani Yassin – Educational Process: International Journal, 2025

Background/purpose: This study aimed to reveal the accuracy of estimation of multiple-choice test items parameters following the models of the item-response theory in measurement. Materials/methods: The researchers depended on the measurement accuracy indicators, which express the absolute difference between the estimated and actual values of the…

Descriptors: Accuracy, Computation, Multiple Choice Tests, Test Items

A Chi-Square Statistic for Testing the Equality of Distracters' Plausibility in Multiple-Choice Test Items

Download full text

Sherwin E. Balbuena – Online Submission, 2024

This study introduces a new chi-square test statistic for testing the equality of response frequencies among distracters in multiple-choice tests. The formula uses the information from the number of correct answers and wrong answers, which becomes the basis of calculating the expected values of response frequencies per distracter. The method was…

Descriptors: Multiple Choice Tests, Statistics, Test Validity, Testing

Evaluating Methodological Enhancements to the Yes/No Angoff Standard-Setting Method in Language Proficiency Assessment

Peer reviewed

Direct link

Tia M. Fechter; Heeyeon Yoon – Language Testing, 2024

This study evaluated the efficacy of two proposed methods in an operational standard-setting study conducted for a high-stakes language proficiency test of the U.S. government. The goal was to seek low-cost modifications to the existing Yes/No Angoff method to increase the validity and reliability of the recommended cut scores using a convergent…

Descriptors: Standard Setting, Language Proficiency, Language Tests, Evaluation Methods

Text-Based Question Difficulty Prediction: A Systematic Review of Automatic Approaches

Peer reviewed

Direct link

Samah AlKhuzaey; Floriana Grasso; Terry R. Payne; Valentina Tamma – International Journal of Artificial Intelligence in Education, 2024

Designing and constructing pedagogical tests that contain items (i.e. questions) which measure various types of skills for different levels of students equitably is a challenging task. Teachers and item writers alike need to ensure that the quality of assessment materials is consistent, if student evaluations are to be objective and effective.…

Descriptors: Test Items, Test Construction, Difficulty Level, Prediction

The Impact of Insufficient Effort Responses on the Order of Category Thresholds in the Polytomous Rasch Model

Peer reviewed

Direct link

Kuan-Yu Jin; Thomas Eckes – Educational and Psychological Measurement, 2024

Insufficient effort responding (IER) refers to a lack of effort when answering survey or questionnaire items. Such items typically offer more than two ordered response categories, with Likert-type scales as the most prominent example. The underlying assumption is that the successive categories reflect increasing levels of the latent variable…

Descriptors: Item Response Theory, Test Items, Test Wiseness, Surveys

Feature versus Object in Interpreting Working Memory Capacity

Peer reviewed

Direct link

Wuji Lin; Chenxi Lv; Jiejie Liao; Yuan Hu; Yutong Liu; Jingyuan Lin – npj Science of Learning, 2024

The debate about whether the capacity of working memory (WM) varies with the complexity of memory items continues. This study employed novel experimental materials to investigate the role of complexity in WM capacity. Across seven experiments, we explored the relationship between complexity and WM capacity. The results indicated that the…

Descriptors: Short Term Memory, Difficulty Level, Retention (Psychology), Test Items

How Hard Can This Question Be? An Exploratory Analysis of Features Assessing Question Difficulty Using LLMs

Peer reviewed

Andreea Dutulescu; Stefan Ruseti; Mihai Dascalu; Danielle S. McNamara – Grantee Submission, 2024

Assessing the difficulty of reading comprehension questions is crucial to educational methodologies and language understanding technologies. Traditional methods of assessing question difficulty rely frequently on human judgments or shallow metrics, often failing to accurately capture the intricate cognitive demands of answering a question. This…

Descriptors: Difficulty Level, Reading Tests, Test Items, Reading Comprehension

Comparative Evaluation of C-Test Reliability Using Classical and Modern Psychometric Methods

Peer reviewed
PDF on ERIC

Download full text

Neda Kianinezhad; Mohsen Kianinezhad – Language Education & Assessment, 2025

This study presents a comparative analysis of classical reliability measures, including Cronbach's alpha, test-retest, and parallel forms reliability, alongside modern psychometric methods such as the Rasch model and Mokken scaling, to evaluate the reliability of C-tests in language proficiency assessment. Utilizing data from 150 participants…

Descriptors: Psychometrics, Test Reliability, Language Proficiency, Language Tests

Validation of an Elicited Imitation Test as a Measure of Korean Language Proficiency

Peer reviewed

Direct link

Hojung Kim; Changkyung Song; Jiyoung Kim; Hyeyun Jeong; Jisoo Park – Language Testing in Asia, 2024

This study presents a modified version of the Korean Elicited Imitation (EI) test, designed to resemble natural spoken language, and validates its reliability as a measure of proficiency. The study assesses the correlation between average test scores and Test of Proficiency in Korean (TOPIK) levels, examining score distributions among beginner,…

Descriptors: Korean, Test Validity, Test Reliability, Imitation

Empirically Deriving Cut Scores in the Positive Behavioral Interventions and Supports (PBIS) Tiered Fidelity Inventory (TFI) through a Bookmarking Process

Peer reviewed

Direct link

Jerin Kim; Kent McIntosh – Journal of Positive Behavior Interventions, 2025

We aimed to identify empirically valid cut scores on the positive behavioral interventions and supports (PBIS) Tiered Fidelity Inventory (TFI) through an expert panel process known as bookmarking. The TFI is a measurement tool to evaluate the fidelity of implementation of PBIS. In the bookmark method, experts reviewed all TFI items and item scores…

Descriptors: Positive Behavior Supports, Cutting Scores, Fidelity, Program Evaluation

Is Effort Moderated Scoring Robust to Multidimensional Rapid Guessing?

Peer reviewed

Direct link

Joseph A. Rios; Jiayi Deng – Educational and Psychological Measurement, 2025

To mitigate the potential damaging consequences of rapid guessing (RG), a form of noneffortful responding, researchers have proposed a number of scoring approaches. The present simulation study examines the robustness of the most popular of these approaches, the unidimensional effort-moderated (EM) scoring procedure, to multidimensional RG (i.e.,…

Descriptors: Scoring, Guessing (Tests), Reaction Time, Item Response Theory

Investigating Construct Validity of Cognitive Load Measurement Using Single-Item Subjective Rating Scales

Peer reviewed

Direct link

Katrin Schuessler; Vanessa Fischer; Maik Walpuski – Instructional Science: An International Journal of the Learning Sciences, 2025

Cognitive load studies are mostly centered on information on perceived cognitive load. Single-item subjective rating scales are the dominant measurement practice to investigate overall cognitive load. Usually, either invested mental effort or perceived task difficulty is used as an overall cognitive load measure. However, the extent to which the…

Descriptors: Cognitive Processes, Difficulty Level, Rating Scales, Construct Validity

Improvised Progressive Model Based on Automatic Calibration of Difficulty Level: A Practical Solution of Competitive-Based Examination

Peer reviewed

Direct link

Aditya Shah; Ajay Devmane; Mehul Ranka; Prathamesh Churi – Education and Information Technologies, 2024

Online learning has grown due to the advancement of technology and flexibility. Online examinations measure students' knowledge and skills. Traditional question papers include inconsistent difficulty levels, arbitrary question allocations, and poor grading. The suggested model calibrates question paper difficulty based on student performance to…

Descriptors: Computer Assisted Testing, Difficulty Level, Grading, Test Construction

Previous Page | Next Page »

Pages: 1 | 2 | 3 | 4

Education and Information…	3
ProQuest LLC	3
Educational and Psychological…	2
International Journal of…	2
Journal of Baltic Science…	2
Journal of Biological…	2
Language Testing in Asia	2
Practical Assessment,…	2
Annenberg Institute for…	1
Applied Measurement in…	1
Chemistry Education Research…	1
Creativity Research Journal	1
Educational Assessment,…	1
Educational Process:…	1
Educational Psychology Review	1
Grantee Submission	1
Innovations in Education and…	1
Instructional Science: An…	1
Interchange: A Quarterly…	1
International Electronic…	1
International Journal of…	1
International Journal of…	1
International Journal of…	1
Journal of Autism and…	1
Journal of College Science…	1
More ▼

Benjamin W. Domingue	2
Joshua B. Gilbert	2
Luke W. Miratrix	2
Mridul Joshi	2
A. Alexander Beaujean	1
Adam E. Green	1
Adekunle Ibrahim Oladejo	1
Aditya Shah	1
Agnieszka Slezak-Swiat	1
Agus Santoso	1
Ahmed Al - Badri	1
Aiman Mohammad Freihat	1
Ajay Devmane	1
Alexander Kah	1
Ali Zahabi	1
Alicia A. Stoltenberg	1
Amelia Pearson	1
Andreea Dutulescu	1
Andrés Christiansen	1
Angeli V. Collano	1
Ann E. Harman	1
Anna Clarissa D. Aves	1
Anthony Howcroft	1
Apichat Khamboonruang	1
Asli Görgülü Ari	1
More ▼