ERIC - Search Results

Publication Date

In 2025	19
Since 2024	40

Descriptor

Item Response Theory	40
Test Reliability	40
Test Validity	19
Foreign Countries	15
Test Items	15
Psychometrics	10
Scores	10
Test Construction	8
Response Style (Tests)	6
Undergraduate Students	6
College Students	5
Correlation	5
Difficulty Level	5
Error of Measurement	5
Goodness of Fit	5
Measures (Individuals)	5
English (Second Language)	4
High School Students	4
Language Tests	4
Models	4
Multiple Choice Tests	4
Questionnaires	4
Rating Scales	4
Scoring	4
Test Bias	4
More ▼

Publication Type

Reports - Research	39
Journal Articles	37
Tests/Questionnaires	3
Dissertations/Theses -…	1

Education Level

Higher Education	11
Postsecondary Education	11
Secondary Education	6
High Schools	5
Elementary Education	4
Middle Schools	4
Early Childhood Education	3
Junior High Schools	3
Primary Education	3
Grade 2	2
Grade 3	2
Grade 4	2
Intermediate Grades	2
Grade 1	1
Grade 12	1
Grade 5	1
Grade 8	1
Grade 9	1
Kindergarten	1
More ▼

Audience

Practitioners	1
Researchers	1

Location

Indonesia	7
Iran	2
Colombia	1
France	1
Germany	1
Greece	1
Hawaii	1
Saudi Arabia	1
Uzbekistan	1

Laws, Policies, & Programs

Assessments and Surveys

Autism Diagnostic Observation…

What Works Clearinghouse Rating

Showing 1 to 15 of 40 results Save | Export

Another Look at Yen's Q3: Is 0.2 an Appropriate Cut-Off?

Peer reviewed

Direct link

Kelsey Nason; Christine DeMars – Journal of Educational Measurement, 2025

This study examined the widely used threshold of 0.2 for Yen's Q3, an index for violations of local independence. Specifically, a simulation was conducted to investigate whether Q3 values were related to the magnitude of bias in estimates of reliability, item parameters, and examinee ability. Results showed that Q3 values below the typical cut-off…

Descriptors: Item Response Theory, Statistical Bias, Test Reliability, Test Items

Detecting Differential Item Functioning among Multiple Groups Using IRT Residual DIF Framework

Peer reviewed

Direct link

Hwanggyu Lim; Danqi Zhu; Edison M. Choe; Kyung T. Han – Journal of Educational Measurement, 2024

This study presents a generalized version of the residual differential item functioning (RDIF) detection framework in item response theory, named GRDIF, to analyze differential item functioning (DIF) in multiple groups. The GRDIF framework retains the advantages of the original RDIF framework, such as computational efficiency and ease of…

Descriptors: Item Response Theory, Test Bias, Test Reliability, Test Construction

Modeling Directional Testlet Effects on Multiple Open-Ended Questions

Peer reviewed

Direct link

Kuan-Yu Jin; Wai-Lok Siu – Journal of Educational Measurement, 2025

Educational tests often have a cluster of items linked by a common stimulus ("testlet"). In such a design, the dependencies caused between items are called "testlet effects." In particular, the directional testlet effect (DTE) refers to a recursive influence whereby responses to earlier items can positively or negatively affect…

Descriptors: Models, Test Items, Educational Assessment, Scores

Comparing and Combining IRTree Models and Anchoring Vignettes in Addressing Response Styles

Peer reviewed

Direct link

Mingfeng Xue; Ping Chen – Journal of Educational Measurement, 2025

Response styles pose great threats to psychological measurements. This research compares IRTree models and anchoring vignettes in addressing response styles and estimating the target traits. It also explores the potential of combining them at the item level and total-score level (ratios of extreme and middle responses to vignettes). Four models…

Descriptors: Item Response Theory, Models, Comparative Analysis, Vignettes

Reliability of the Commonly Used and Newly-Developed Autism Measures

Peer reviewed

Direct link

Thomas W. Frazier; Andrew J. O. Whitehouse; Susan R. Leekam; Sarah J. Carrington; Gail A. Alvares; David W. Evans; Antonio Y. Hardan; Mirko Uljarevic – Journal of Autism and Developmental Disorders, 2024

Purpose: The aim of the present study was to compare scale and conditional reliability derived from item response theory analyses among the most commonly used, as well as several newly developed, observation, interview, and parent-report autism instruments. Methods: When available, data sets were combined to facilitate large sample evaluation.…

Descriptors: Test Reliability, Item Response Theory, Autism Spectrum Disorders, Clinical Diagnosis

Exploring the Influence of Response Styles on Continuous Scale Assessments: Insights from a Novel Modeling Approach

Peer reviewed

Direct link

Hung-Yu Huang – Educational and Psychological Measurement, 2025

The use of discrete categorical formats to assess psychological traits has a long-standing tradition that is deeply embedded in item response theory models. The increasing prevalence and endorsement of computer- or web-based testing has led to greater focus on continuous response formats, which offer numerous advantages in both respondent…

Descriptors: Response Style (Tests), Psychological Characteristics, Item Response Theory, Test Reliability

Comparative Evaluation of C-Test Reliability Using Classical and Modern Psychometric Methods

Peer reviewed
PDF on ERIC

Download full text

Neda Kianinezhad; Mohsen Kianinezhad – Language Education & Assessment, 2025

This study presents a comparative analysis of classical reliability measures, including Cronbach's alpha, test-retest, and parallel forms reliability, alongside modern psychometric methods such as the Rasch model and Mokken scaling, to evaluate the reliability of C-tests in language proficiency assessment. Utilizing data from 150 participants…

Descriptors: Psychometrics, Test Reliability, Language Proficiency, Language Tests

The Sensitivity of Value-Added Estimates to Test Scoring Decisions. EdWorkingPaper No. 25-1226

Download full text

Joshua B. Gilbert; James G. Soland; Benjamin W. Domingue – Annenberg Institute for School Reform at Brown University, 2025

Value-Added Models (VAMs) are both common and controversial in education policy and accountability research. While the sensitivity of VAMs to model specification and covariate selection is well documented, the extent to which test scoring methods (e.g., mean scores vs. IRT-based scores) may affect VA estimates is less studied. We examine the…

Descriptors: Value Added Models, Tests, Testing, Scoring

Empowering Digital Transformation: Developing and Validating a Digital Leadership Scale through Rasch Model Analysis

Peer reviewed

Direct link

Novina Sabila Zahra; Hillman Wirawan – Measurement: Interdisciplinary Research and Perspectives, 2025

Technology development has triggered digital transformation in various organizations, influencing work processes, communication, and innovation. Digital leadership plays a crucial role in directing and managing this transformation. This research aims to develop a new measurement tool for assessing digital leadership using the Rasch Model for…

Descriptors: Leadership, Measures (Individuals), Test Validity, Item Response Theory

Properties and Performance of the One-Parameter Log-Linear Cognitive Diagnosis Model

Peer reviewed

Direct link

Lientje Maas; Matthew J. Madison; Matthieu J. S. Brinkhuis – Grantee Submission, 2024

Diagnostic classification models (DCMs) are psychometric models that yield probabilistic classifications of respondents according to a set of discrete latent variables. The current study examines the recently introduced one-parameter log-linear cognitive diagnosis model (1-PLCDM), which has increased interpretability compared with general DCMs due…

Descriptors: Clinical Diagnosis, Classification, Models, Psychometrics

Estimating the Reliability of Skill Transitions in Longitudinal Diagnostic Classification Models

Peer reviewed

Direct link

Madeline A. Schellman; Matthew J. Madison – Grantee Submission, 2024

Diagnostic classification models (DCMs) have grown in popularity as stakeholders increasingly desire actionable information related to students' skill competencies. Longitudinal DCMs offer a psychometric framework for providing estimates of students' proficiency status transitions over time. For both cross-sectional and longitudinal DCMs, it is…

Descriptors: Diagnostic Tests, Classification, Models, Psychometrics

Validation of an Elicited Imitation Test as a Measure of Korean Language Proficiency

Peer reviewed

Direct link

Hojung Kim; Changkyung Song; Jiyoung Kim; Hyeyun Jeong; Jisoo Park – Language Testing in Asia, 2024

This study presents a modified version of the Korean Elicited Imitation (EI) test, designed to resemble natural spoken language, and validates its reliability as a measure of proficiency. The study assesses the correlation between average test scores and Test of Proficiency in Korean (TOPIK) levels, examining score distributions among beginner,…

Descriptors: Korean, Test Validity, Test Reliability, Imitation

Linking Errors Introduced by Rapid Guessing Responses When Employing Multigroup Concurrent IRT Scaling

Direct link

Jiayi Deng – ProQuest LLC, 2024

Test score comparability in international large-scale assessments (LSA) is of utmost importance in measuring the effectiveness of education systems and understanding the impact of education on economic growth. To effectively compare test scores on an international scale, score linking is widely used to convert raw scores from different linguistic…

Descriptors: Item Response Theory, Scoring Rubrics, Scoring, Error of Measurement

Exploring the Relationship between Motivation and Augmented Reality Presence Using the Augmented Reality Presence Scale (ARPS)

Peer reviewed

Direct link

Enrico Gandolfi; Richard E. Ferdig – Educational Technology Research and Development, 2025

Augmented Reality (AR) is increasingly being adopted in education to foster engagement and interest in a variety of subjects and content areas. However, there is a scarcity of instruments to measure the instructional impact of this innovation. This article addresses this gap in two unique ways. First, it presents validation results of the…

Descriptors: Simulated Environment, Measures (Individuals), Rating Scales, Item Response Theory

Is Effort Moderated Scoring Robust to Multidimensional Rapid Guessing?

Peer reviewed

Direct link

Joseph A. Rios; Jiayi Deng – Educational and Psychological Measurement, 2025

To mitigate the potential damaging consequences of rapid guessing (RG), a form of noneffortful responding, researchers have proposed a number of scoring approaches. The present simulation study examines the robustness of the most popular of these approaches, the unidimensional effort-moderated (EM) scoring procedure, to multidimensional RG (i.e.,…

Descriptors: Scoring, Guessing (Tests), Reaction Time, Item Response Theory

Previous Page | Next Page »

Pages: 1 | 2 | 3

Journal of Educational…	4
Educational and Psychological…	3
Education and Information…	2
Grantee Submission	2
Language Testing	2
SAGE Open	2
Annenberg Institute for…	1
Chemistry Education Research…	1
Educational Technology…	1
Electronic Journal of…	1
Evaluation Review	1
International Journal of…	1
International Journal of…	1
Journal of Applied Research…	1
Journal of Autism and…	1
Journal of Baltic Science…	1
Journal of Biological…	1
Journal of Education and…	1
Journal of Education and…	1
Journal of Educational Data…	1
Journal of Educational and…	1
Journal of Motor Learning and…	1
Journal of Psychoeducational…	1
Language Assessment Quarterly	1
Language Education &…	1
More ▼

Jiayi Deng	2
Matthew J. Madison	2
Abdullah Alamer	1
Abdulrahman Alshabeb	1
Achmad Samsudin	1
Adam Hadiana Aminudin	1
Ahmed Al Khateeb	1
Amanda Timler	1
Anastasia Sofroniou	1
Andi Suhandi	1
Andrew J. O. Whitehouse	1
Angelica Garzon Umerenkova	1
Anggun Resdasari Prasetyo	1
Anirudhan Badrinath	1
Ann Tai Choe	1
Antonio Y. Hardan	1
Bambang Sumintono	1
Barno S. Abdullaeva	1
Bayram Costu	1
Benjamin W. Domingue	1
Beth Hands	1
Caroline M. Böhm	1
Changkyung Song	1
Christine DeMars	1
Dadan Rosana	1
More ▼