ERIC - Search Results

Publication Date

In 2025	38
Since 2024	83
Since 2021 (last 5 years)	271
Since 2016 (last 10 years)	607
Since 2006 (last 20 years)	955

Descriptor

Test Items	955
Test Reliability	734
Test Validity	470
Test Construction	377
Foreign Countries	373
Item Response Theory	236
Psychometrics	227
Difficulty Level	209
Scores	188
Factor Analysis	179
Reliability	170
Item Analysis	164
Correlation	158
Statistical Analysis	114
Measures (Individuals)	111
Goodness of Fit	104
Scoring	99
Undergraduate Students	95
Multiple Choice Tests	94
Factor Structure	84
Construct Validity	83
Interrater Reliability	83
Questionnaires	83
Science Tests	82
Comparative Analysis	81
More ▼

Publication Type

Journal Articles	831
Reports - Research	750
Reports - Evaluative	109
Tests/Questionnaires	82
Reports - Descriptive	44
Dissertations/Theses -…	35
Numerical/Quantitative Data	22
Speeches/Meeting Papers	20
Information Analyses	8
Guides - Non-Classroom	5
Opinion Papers	5
Collected Works - General	3
Books	2
Guides - General	2
Multilingual/Bilingual…	1
Non-Print Media	1
Reference Materials - General	1
Reports - General	1
More ▼

Education Level

Higher Education	283
Postsecondary Education	236
Secondary Education	170
Elementary Education	147
High Schools	80
Middle Schools	79
Junior High Schools	57
Elementary Secondary Education	48
Early Childhood Education	47
Intermediate Grades	37
Primary Education	35
Grade 8	28
Grade 5	23
Grade 7	22
Grade 6	20
Kindergarten	20
Grade 2	18
Grade 3	17
Grade 4	17
Grade 1	13
Grade 9	13
Adult Education	7
Preschool Education	7
Grade 10	5
Grade 12	5
More ▼

Audience

Teachers	6
Administrators	5
Support Staff	3
Researchers	2
Counselors	1
Parents	1
Policymakers	1
Practitioners	1

Location

Turkey	80
Indonesia	33
Germany	24
China	18
Florida	17
Canada	16
Australia	15
California	12
India	12
Malaysia	11
Taiwan	11
United States	11
Iran	10
Netherlands	10
New York	9
United Kingdom	9
Nigeria	8
South Korea	8
Illinois	7
Turkey (Ankara)	7
Turkey (Istanbul)	7
Japan	6
Jordan	6
Maryland	6
Nebraska	6
More ▼

Laws, Policies, & Programs

Individuals with Disabilities…	4
No Child Left Behind Act 2001	4
Every Student Succeeds Act…	3
Rehabilitation Act 1973…	3
Head Start	1
United Nations Convention on…	1

What Works Clearinghouse Rating

Meets WWC Standards without Reservations	1
Meets WWC Standards with or without Reservations	1

Showing 1 to 15 of 955 results Save | Export

Another Look at Yen's Q3: Is 0.2 an Appropriate Cut-Off?

Peer reviewed

Direct link

Kelsey Nason; Christine DeMars – Journal of Educational Measurement, 2025

This study examined the widely used threshold of 0.2 for Yen's Q3, an index for violations of local independence. Specifically, a simulation was conducted to investigate whether Q3 values were related to the magnitude of bias in estimates of reliability, item parameters, and examinee ability. Results showed that Q3 values below the typical cut-off…

Descriptors: Item Response Theory, Statistical Bias, Test Reliability, Test Items

A Review of Automatic Item Generation Techniques Leveraging Large Language Models

Peer reviewed
PDF on ERIC

Download full text

Bin Tan; Nour Armoush; Elisabetta Mazzullo; Okan Bulut; Mark J. Gierl – International Journal of Assessment Tools in Education, 2025

This study reviews existing research on the use of large language models (LLMs) for automatic item generation (AIG). We performed a comprehensive literature search across seven research databases, selected studies based on predefined criteria, and summarized 60 relevant studies that employed LLMs in the AIG process. We identified the most commonly…

Descriptors: Artificial Intelligence, Test Items, Automation, Test Format

Examining the Wording Effect: What Are We Measuring?

Peer reviewed

Direct link

Abdullah Faruk Kiliç; Meltem Acar Güvendir; Gül Güler; Tugay Kaçak – Measurement: Interdisciplinary Research and Perspectives, 2025

In this study, the extent to wording effects impact structure and factor loadings, internal consistency and measurement invariance was outlined. The modified form, which includes items that semantically reversed, explains %21.5 more variance than the original form. Also, reversed items' factor loadings are higher. As a result of CFA, indexes…

Descriptors: Test Items, Factor Structure, Test Reliability, Semantics

Ratings of Students' Stress: Initial Reliability and Validity Evidence for a Brief Stress and Resilience Assessment

Peer reviewed

Direct link

Christopher J. Anthony; Stephen N. Elliott – School Mental Health, 2025

Stress is a complex construct that is related to resilience and general health starting in childhood. Despite its importance for student health and well-being, there are few measures of stress designed for school-based applications. In this study, we developed and initially validated a Stress Indicators Scale using five samples of teachers,…

Descriptors: Test Construction, Stress Variables, Test Validity, Test Items

Establishing a Physics Concept Inventory Using Computer Marked Free-Response Questions

Peer reviewed
PDF on ERIC

Download full text

Parker, Mark A. J.; Hedgeland, Holly; Jordan, Sally E.; Braithwaite, Nicholas St. J. – European Journal of Science and Mathematics Education, 2023

The study covers the development and testing of the alternative mechanics survey (AMS), a modified force concept inventory (FCI), which used automatically marked free-response questions. Data were collected over a period of three academic years from 611 participants who were taking physics classes at high school and university level. A total of…

Descriptors: Test Construction, Scientific Concepts, Physics, Test Reliability

Seeking the Real Reliability: Why the Traditional Estimators of Reliability Usually Fail in Achievement Testing and Why the Deflation-Corrected Coefficients Could Be Better Options

Peer reviewed
PDF on ERIC

Download full text

Metsämuuronen, Jari – Practical Assessment, Research & Evaluation, 2023

Traditional estimators of reliability such as coefficients alpha, theta, omega, and rho (maximal reliability) are prone to give radical underestimates of reliability for the tests common when testing educational achievement. These tests are often structured by widely deviating item difficulties. This is a typical pattern where the traditional…

Descriptors: Test Reliability, Achievement Tests, Computation, Test Items

Modeling Directional Testlet Effects on Multiple Open-Ended Questions

Peer reviewed

Direct link

Kuan-Yu Jin; Wai-Lok Siu – Journal of Educational Measurement, 2025

Educational tests often have a cluster of items linked by a common stimulus ("testlet"). In such a design, the dependencies caused between items are called "testlet effects." In particular, the directional testlet effect (DTE) refers to a recursive influence whereby responses to earlier items can positively or negatively affect…

Descriptors: Models, Test Items, Educational Assessment, Scores

Comparing and Combining IRTree Models and Anchoring Vignettes in Addressing Response Styles

Peer reviewed

Direct link

Mingfeng Xue; Ping Chen – Journal of Educational Measurement, 2025

Response styles pose great threats to psychological measurements. This research compares IRTree models and anchoring vignettes in addressing response styles and estimating the target traits. It also explores the potential of combining them at the item level and total-score level (ratios of extreme and middle responses to vignettes). Four models…

Descriptors: Item Response Theory, Models, Comparative Analysis, Vignettes

Item-Level Heterogeneity in Value Added Models: Implications for Reliability, Cross-Study Comparability, and Effect Sizes. EdWorkingPaper No. 25-1173

Download full text

Joshua B. Gilbert; Zachary Himmelsbach; Luke W. Miratrix; Andrew D. Ho; Benjamin W. Domingue – Annenberg Institute for School Reform at Brown University, 2025

Value added models (VAMs) attempt to estimate the causal effects of teachers and schools on student test scores. We apply Generalizability Theory to show how estimated VA effects depend upon the selection of test items. Standard VAMs estimate causal effects on the items that are included on the test. Generalizability demands consideration of how…

Descriptors: Value Added Models, Reliability, Effect Size, Test Items

Estimating the Psychometric Properties ("Item Difficulty, Discrimination and Reliability Indices") of Test Items Using Kuder-Richardson Approach (KR-20)

Peer reviewed
PDF on ERIC

Download full text

Ntumi, Simon; Agbenyo, Sheilla; Bulala, Tapela – Shanlax International Journal of Education, 2023

There is no need or point to testing of knowledge, attributes, traits, behaviours or abilities of an individual if information obtained from the test is inaccurate. However, by and large, it seems the estimation of psychometric properties of test items in classroomshas been completely ignored otherwise dying slowly in most testing environments. In…

Descriptors: Psychometrics, Accuracy, Test Validity, Factor Analysis

Reconceptualization of Coefficient Alpha Reliability for Test Summed and Scaled Scores

Peer reviewed

Direct link

Almehrizi, Rashid S. – Educational Measurement: Issues and Practice, 2022

Coefficient alpha reliability persists as the most common reliability coefficient reported in research. The assumptions for its use are, however, not well-understood. The current paper challenges the commonly used expressions of coefficient alpha and argues that while these expressions are correct when estimating reliability for summed scores,…

Descriptors: Reliability, Scores, Scaling, Statistical Analysis

Developing a Spatial Thinking Skills Test in Geography Teaching

Peer reviewed
PDF on ERIC

Download full text

Atakan Yalcin; Cennet Sanli; Adnan Pinar – Journal of Theoretical Educational Science, 2025

This study aimed to develop a test to measure university students' spatial thinking skills. The research was conducted using a survey design, with a sample of 260 undergraduate students from geography teaching and geography departments. GIS software was used to incorporate maps and satellite images, enhancing the spatial representation in the…

Descriptors: Spatial Ability, Thinking Skills, Geography, Undergraduate Students

A Comparison of Yen's Q3 Coefficient and Rasch Testlet Modeling for Identifying Local Item Dependence: Evidence from Two Vocabulary Matching Tests

Peer reviewed

Direct link

Hung Tan Ha; Duyen Thi Bich Nguyen; Tim Stoeckel – Language Assessment Quarterly, 2025

This article compares two methods for detecting local item dependence (LID): residual correlation examination and Rasch testlet modeling (RTM), in a commonly used 3:6 matching format and an extended matching test (EMT) format. The two formats are hypothesized to facilitate different levels of item dependency due to differences in the number of…

Descriptors: Comparative Analysis, Language Tests, Test Items, Item Analysis

The Impact of Inconsistent Responders to Mixed-Worded Scales on Inferences in International Large-Scale Assessments

Peer reviewed

Direct link

Steinmann, Isa; Sánchez, Daniel; van Laar, Saskia; Braeken, Johan – Assessment in Education: Principles, Policy & Practice, 2022

Questionnaire scales that are mixed-worded, i.e. include both positively and negatively worded items, often suffer from issues like low reliability and more complex latent structures than intended. Part of the problem might be that some responders fail to respond consistently to the mixed-worded items. We investigated the prevalence and impact of…

Descriptors: Response Style (Tests), Test Items, Achievement Tests, Foreign Countries

Comparative Evaluation of C-Test Reliability Using Classical and Modern Psychometric Methods

Peer reviewed
PDF on ERIC

Download full text

Neda Kianinezhad; Mohsen Kianinezhad – Language Education & Assessment, 2025

This study presents a comparative analysis of classical reliability measures, including Cronbach's alpha, test-retest, and parallel forms reliability, alongside modern psychometric methods such as the Rasch model and Mokken scaling, to evaluate the reliability of C-tests in language proficiency assessment. Utilizing data from 150 participants…

Descriptors: Psychometrics, Test Reliability, Language Proficiency, Language Tests

Previous Page | Next Page »

Pages: 1 | 2 | 3 | 4 | 5 | 6 | 7 | 8 | 9 | 10 | 11 | ... | 64

Online Submission	35
Journal of Psychoeducational…	34
ProQuest LLC	34
Educational and Psychological…	32
ETS Research Report Series	26
Grantee Submission	21
International Journal of…	20
Applied Measurement in…	15
SAGE Open	14
International Journal of…	12
Journal of Educational…	12
International Journal of…	11
Educational Measurement:…	10
International Journal of…	10
Educational Sciences: Theory…	9
Journal of Baltic Science…	9
Language Assessment Quarterly	9
Practical Assessment,…	9
Chemistry Education Research…	8
Education and Information…	8
Eurasian Journal of…	8
Journal of Education and…	8
Journal of Educational and…	8
Journal of Speech, Language,…	8
Language Testing	8
More ▼

Schoen, Robert C.	12
Anderson, Daniel	6
Guo, Hongwen	6
Liu, Ou Lydia	6
Alonzo, Julie	5
LaVenia, Mark	5
Baghaei, Purya	4
Bauduin, Charity	4
Brennan, Robert L.	4
Farina, Kristy	4
Lee, Won-Chan	4
Petscher, Yaacov	4
Sijtsma, Klaas	4
Tindal, Gerald	4
Yang, Xiaotong	4
Almehrizi, Rashid S.	3
Boone, William J.	3
Dogan, Nuri	3
Edwards, Michael C.	3
Emons, Wilco H. M.	3
Herman, Joan L.	3
Kim, Sooyeon	3
Kyllonen, Patrick	3
Liu, Jinghua	3
Metsämuuronen, Jari	3
More ▼

SAT (College Admission Test)	9
Program for International…	8
Trends in International…	8
Raven Progressive Matrices	6
Test of English as a Foreign…	6
ACT Assessment	5
Marlowe Crowne Social…	4
Peabody Picture Vocabulary…	4
Dynamic Indicators of Basic…	3
Graduate Record Examinations	3
Measures of Academic Progress	3
Progress in International…	3
Rosenberg Self Esteem Scale	3
Strengths and Difficulties…	3
Test of English for…	3
Autism Diagnostic Observation…	2
Center for Epidemiologic…	2
Child Behavior Checklist	2
Flesch Kincaid Grade Level…	2
International English…	2
Iowa Tests of Basic Skills	2
Peabody Developmental Motor…	2
Stanford Achievement Tests	2
Test of Nonverbal Intelligence	2
ACT Interest Inventory	1
More ▼