ERIC - Search Results

Publication Date

In 2025	38
Since 2024	83

Descriptor

Test Items	83
Test Reliability	73
Test Validity	55
Test Construction	47
Foreign Countries	39
Factor Analysis	24
Item Analysis	24
Psychometrics	20
Measures (Individuals)	18
Difficulty Level	17
Item Response Theory	16
Goodness of Fit	15
Factor Structure	14
Science Tests	14
College Students	13
Multiple Choice Tests	12
Undergraduate Students	12
Scores	11
Reliability	10
Questionnaires	9
Thinking Skills	9
Language Tests	8
Scientific Concepts	8
Biology	7
Comparative Analysis	7
More ▼

Publication Type

Journal Articles	80
Reports - Research	80
Tests/Questionnaires	13
Information Analyses	1
Reports - Descriptive	1
Reports - Evaluative	1

Education Level

Higher Education	35
Postsecondary Education	35
Secondary Education	21
Elementary Education	9
Middle Schools	8
High Schools	7
Junior High Schools	7
Intermediate Grades	3
Early Childhood Education	2
Grade 6	2
Primary Education	2
Adult Education	1
Grade 10	1
Grade 12	1
Grade 3	1
Grade 4	1
Grade 7	1
Grade 8	1
Grade 9	1
Kindergarten	1
Preschool Education	1
More ▼

Audience

Counselors	1
Practitioners	1

Location

Indonesia	7
Turkey	7
China	4
Iran	3
Vietnam	3
Malaysia	2
Turkey (Istanbul)	2
Bosnia and Herzegovina	1
Cyprus	1
Ethiopia	1
Japan	1
Netherlands	1
New York (New York)	1
Peru	1
Philippines	1
Poland	1
Romania	1
Saudi Arabia	1
South Korea	1
Taiwan	1
Tanzania	1
Thailand	1
United Kingdom	1
United States	1
Uzbekistan	1
More ▼

Laws, Policies, & Programs

Assessments and Surveys

Ages and Stages Questionnaires	1
General Social Survey	1
Program for International…	1
Stages of Concern…	1
Test of English for…	1
Watson Glaser Critical…	1
Wechsler Individual…	1

What Works Clearinghouse Rating

Showing 1 to 15 of 83 results Save | Export

Another Look at Yen's Q3: Is 0.2 an Appropriate Cut-Off?

Peer reviewed

Direct link

Kelsey Nason; Christine DeMars – Journal of Educational Measurement, 2025

This study examined the widely used threshold of 0.2 for Yen's Q3, an index for violations of local independence. Specifically, a simulation was conducted to investigate whether Q3 values were related to the magnitude of bias in estimates of reliability, item parameters, and examinee ability. Results showed that Q3 values below the typical cut-off…

Descriptors: Item Response Theory, Statistical Bias, Test Reliability, Test Items

A Review of Automatic Item Generation Techniques Leveraging Large Language Models

Peer reviewed
PDF on ERIC

Download full text

Bin Tan; Nour Armoush; Elisabetta Mazzullo; Okan Bulut; Mark J. Gierl – International Journal of Assessment Tools in Education, 2025

This study reviews existing research on the use of large language models (LLMs) for automatic item generation (AIG). We performed a comprehensive literature search across seven research databases, selected studies based on predefined criteria, and summarized 60 relevant studies that employed LLMs in the AIG process. We identified the most commonly…

Descriptors: Artificial Intelligence, Test Items, Automation, Test Format

Examining the Wording Effect: What Are We Measuring?

Peer reviewed

Direct link

Abdullah Faruk Kiliç; Meltem Acar Güvendir; Gül Güler; Tugay Kaçak – Measurement: Interdisciplinary Research and Perspectives, 2025

In this study, the extent to wording effects impact structure and factor loadings, internal consistency and measurement invariance was outlined. The modified form, which includes items that semantically reversed, explains %21.5 more variance than the original form. Also, reversed items' factor loadings are higher. As a result of CFA, indexes…

Descriptors: Test Items, Factor Structure, Test Reliability, Semantics

Ratings of Students' Stress: Initial Reliability and Validity Evidence for a Brief Stress and Resilience Assessment

Peer reviewed

Direct link

Christopher J. Anthony; Stephen N. Elliott – School Mental Health, 2025

Stress is a complex construct that is related to resilience and general health starting in childhood. Despite its importance for student health and well-being, there are few measures of stress designed for school-based applications. In this study, we developed and initially validated a Stress Indicators Scale using five samples of teachers,…

Descriptors: Test Construction, Stress Variables, Test Validity, Test Items

Modeling Directional Testlet Effects on Multiple Open-Ended Questions

Peer reviewed

Direct link

Kuan-Yu Jin; Wai-Lok Siu – Journal of Educational Measurement, 2025

Educational tests often have a cluster of items linked by a common stimulus ("testlet"). In such a design, the dependencies caused between items are called "testlet effects." In particular, the directional testlet effect (DTE) refers to a recursive influence whereby responses to earlier items can positively or negatively affect…

Descriptors: Models, Test Items, Educational Assessment, Scores

Comparing and Combining IRTree Models and Anchoring Vignettes in Addressing Response Styles

Peer reviewed

Direct link

Mingfeng Xue; Ping Chen – Journal of Educational Measurement, 2025

Response styles pose great threats to psychological measurements. This research compares IRTree models and anchoring vignettes in addressing response styles and estimating the target traits. It also explores the potential of combining them at the item level and total-score level (ratios of extreme and middle responses to vignettes). Four models…

Descriptors: Item Response Theory, Models, Comparative Analysis, Vignettes

Item-Level Heterogeneity in Value Added Models: Implications for Reliability, Cross-Study Comparability, and Effect Sizes. EdWorkingPaper No. 25-1173

Download full text

Joshua B. Gilbert; Zachary Himmelsbach; Luke W. Miratrix; Andrew D. Ho; Benjamin W. Domingue – Annenberg Institute for School Reform at Brown University, 2025

Value added models (VAMs) attempt to estimate the causal effects of teachers and schools on student test scores. We apply Generalizability Theory to show how estimated VA effects depend upon the selection of test items. Standard VAMs estimate causal effects on the items that are included on the test. Generalizability demands consideration of how…

Descriptors: Value Added Models, Reliability, Effect Size, Test Items

Developing a Spatial Thinking Skills Test in Geography Teaching

Peer reviewed
PDF on ERIC

Download full text

Atakan Yalcin; Cennet Sanli; Adnan Pinar – Journal of Theoretical Educational Science, 2025

This study aimed to develop a test to measure university students' spatial thinking skills. The research was conducted using a survey design, with a sample of 260 undergraduate students from geography teaching and geography departments. GIS software was used to incorporate maps and satellite images, enhancing the spatial representation in the…

Descriptors: Spatial Ability, Thinking Skills, Geography, Undergraduate Students

A Comparison of Yen's Q3 Coefficient and Rasch Testlet Modeling for Identifying Local Item Dependence: Evidence from Two Vocabulary Matching Tests

Peer reviewed

Direct link

Hung Tan Ha; Duyen Thi Bich Nguyen; Tim Stoeckel – Language Assessment Quarterly, 2025

This article compares two methods for detecting local item dependence (LID): residual correlation examination and Rasch testlet modeling (RTM), in a commonly used 3:6 matching format and an extended matching test (EMT) format. The two formats are hypothesized to facilitate different levels of item dependency due to differences in the number of…

Descriptors: Comparative Analysis, Language Tests, Test Items, Item Analysis

Comparative Evaluation of C-Test Reliability Using Classical and Modern Psychometric Methods

Peer reviewed
PDF on ERIC

Download full text

Neda Kianinezhad; Mohsen Kianinezhad – Language Education & Assessment, 2025

This study presents a comparative analysis of classical reliability measures, including Cronbach's alpha, test-retest, and parallel forms reliability, alongside modern psychometric methods such as the Rasch model and Mokken scaling, to evaluate the reliability of C-tests in language proficiency assessment. Utilizing data from 150 participants…

Descriptors: Psychometrics, Test Reliability, Language Proficiency, Language Tests

The Impact of Measurement Model Misspecification on Coefficient Omega Estimates of Composite Reliability

Peer reviewed

Direct link

Stephanie M. Bell; R. Philip Chalmers; David B. Flora – Educational and Psychological Measurement, 2024

Coefficient omega indices are model-based composite reliability estimates that have become increasingly popular. A coefficient omega index estimates how reliably an observed composite score measures a target construct as represented by a factor in a factor-analysis model; as such, the accuracy of omega estimates is likely to depend on correct…

Descriptors: Influences, Models, Measurement Techniques, Reliability

The Effects of Reverse Items on Psychometric Properties and Respondents' Scale Scores According to Different Item Reversal Strategies

Peer reviewed
PDF on ERIC

Download full text

Mustafa Ilhan; Nese Güler; Gülsen Tasdelen Teker; Ömer Ergenekon – International Journal of Assessment Tools in Education, 2024

This study aimed to examine the effects of reverse items created with different strategies on psychometric properties and respondents' scale scores. To this end, three versions of a 10-item scale in the research were developed: 10 positive items were integrated in the first form (Form-P) and five positive and five reverse items in the other two…

Descriptors: Test Items, Psychometrics, Scores, Measures (Individuals)

Utilizing Real-Time Test Data to Solve Attenuation Paradox in Computerized Adaptive Testing to Enhance Optimal Design

Peer reviewed

Direct link

Jyun-Hong Chen; Hsiu-Yi Chao – Journal of Educational and Behavioral Statistics, 2024

To solve the attenuation paradox in computerized adaptive testing (CAT), this study proposes an item selection method, the integer programming approach based on real-time test data (IPRD), to improve test efficiency. The IPRD method turns information regarding the ability distribution of the population from real-time test data into feasible test…

Descriptors: Data Use, Computer Assisted Testing, Adaptive Testing, Design

Validation of an Elicited Imitation Test as a Measure of Korean Language Proficiency

Peer reviewed

Direct link

Hojung Kim; Changkyung Song; Jiyoung Kim; Hyeyun Jeong; Jisoo Park – Language Testing in Asia, 2024

This study presents a modified version of the Korean Elicited Imitation (EI) test, designed to resemble natural spoken language, and validates its reliability as a measure of proficiency. The study assesses the correlation between average test scores and Test of Proficiency in Korean (TOPIK) levels, examining score distributions among beginner,…

Descriptors: Korean, Test Validity, Test Reliability, Imitation

Is Effort Moderated Scoring Robust to Multidimensional Rapid Guessing?

Peer reviewed

Direct link

Joseph A. Rios; Jiayi Deng – Educational and Psychological Measurement, 2025

To mitigate the potential damaging consequences of rapid guessing (RG), a form of noneffortful responding, researchers have proposed a number of scoring approaches. The present simulation study examines the robustness of the most popular of these approaches, the unidimensional effort-moderated (EM) scoring procedure, to multidimensional RG (i.e.,…

Descriptors: Scoring, Guessing (Tests), Reaction Time, Item Response Theory

Previous Page | Next Page »

Pages: 1 | 2 | 3 | 4 | 5 | 6

Education and Information…	4
International Journal of…	4
Journal of Baltic Science…	4
Journal of Educational…	3
Science Insights Education…	3
Autism: The International…	2
Chemistry Education Research…	2
Educational and Psychological…	2
International Journal of…	2
International Journal of…	2
Journal of Applied Research…	2
Journal of Biological…	2
Journal of Computer Assisted…	2
Journal of Educational and…	2
Language Assessment Quarterly	2
Language Testing	2
Psychology in the Schools	2
SAGE Open	2
ACT, Inc.	1
AERA Open	1
Anatomical Sciences Education	1
Annenberg Institute for…	1
Assessment & Evaluation in…	1
Cognitive Research:…	1
College Teaching	1
More ▼

Hung Tan Ha	2
Sachin Nedungadi	2
Stuart McLean	2
Tim Stoeckel	2
Abdullah Faruk Kiliç	1
Achmad Rante Suparman	1
Adam Carreon	1
Aditya Shah	1
Adnan Pinar	1
Ajay Devmane	1
Albert Sesé	1
Alex Alfredo Valenzuela-Romero	1
Ali Alqarni	1
Ali Bozkurt	1
Ali Zahabi	1
Alisa Lowrey	1
Aliza J. Aldana	1
Allan S. Cohen	1
Amber DeBono	1
Amber Rowland	1
Amelia Pearson	1
Anastasia Sofroniou	1
Andrew D. Ho	1
Andrew Gardiner	1
Angeli V. Collano	1
More ▼