ERIC - Search Results

Publication Date

In 2026	0
Since 2025	4
Since 2022 (last 5 years)	14
Since 2017 (last 10 years)	39
Since 2007 (last 20 years)	78

Descriptor

Comparative Analysis	136
Test Items	136
Test Reliability	89
Difficulty Level	38
Foreign Countries	38
Test Validity	36
Reliability	35
Item Analysis	33
Scores	32
Test Construction	32
Item Response Theory	30
Correlation	28
Statistical Analysis	27
Test Format	24
Multiple Choice Tests	21
Psychometrics	21
Scoring	21
Language Tests	18
Higher Education	17
Mathematics Tests	17
Interrater Reliability	16
Achievement Tests	13
English (Second Language)	13
Computer Assisted Testing	12
Reading Tests	12
More ▼

Publication Type

Reports - Research	98
Journal Articles	83
Speeches/Meeting Papers	28
Reports - Evaluative	20
Tests/Questionnaires	8
Reports - Descriptive	7
Dissertations/Theses -…	3
Information Analyses	3
Books	2
Collected Works - General	2
Collected Works - Serials	2
Non-Print Media	1
Numerical/Quantitative Data	1
Opinion Papers	1
Reference Materials - General	1
Reports - General	1
More ▼

Education Level

Higher Education	27
Postsecondary Education	21
Secondary Education	9
Elementary Education	8
High Schools	5
Early Childhood Education	4
Elementary Secondary Education	3
Intermediate Grades	2
Middle Schools	2
Primary Education	2
Grade 2	1
Grade 4	1
Grade 6	1
Kindergarten	1
More ▼

Audience

Researchers	3
Administrators	1
Parents	1
Policymakers	1
Teachers	1

Location

Germany	4
Iran	4
Canada	3
Japan	3
United States	3
Colorado	2
District of Columbia	2
Georgia	2
India	2
Nevada	2
New York	2
Turkey	2
Turkey (Ankara)	2
United Kingdom (England)	2
Vietnam	2
Washington	2
Australia	1
China	1
Florida	1
France	1
Idaho	1
Illinois	1
Indonesia	1
Israel	1
Italy (Rome)	1
More ▼

Laws, Policies, & Programs

What Works Clearinghouse Rating

Showing 1 to 15 of 136 results Save | Export

Comparing and Combining IRTree Models and Anchoring Vignettes in Addressing Response Styles

Peer reviewed

Direct link

Mingfeng Xue; Ping Chen – Journal of Educational Measurement, 2025

Response styles pose great threats to psychological measurements. This research compares IRTree models and anchoring vignettes in addressing response styles and estimating the target traits. It also explores the potential of combining them at the item level and total-score level (ratios of extreme and middle responses to vignettes). Four models…

Descriptors: Item Response Theory, Models, Comparative Analysis, Vignettes

A Comparison of Yen's Q3 Coefficient and Rasch Testlet Modeling for Identifying Local Item Dependence: Evidence from Two Vocabulary Matching Tests

Peer reviewed

Direct link

Hung Tan Ha; Duyen Thi Bich Nguyen; Tim Stoeckel – Language Assessment Quarterly, 2025

This article compares two methods for detecting local item dependence (LID): residual correlation examination and Rasch testlet modeling (RTM), in a commonly used 3:6 matching format and an extended matching test (EMT) format. The two formats are hypothesized to facilitate different levels of item dependency due to differences in the number of…

Descriptors: Comparative Analysis, Language Tests, Test Items, Item Analysis

Generating Social and Emotional Skill Items: Humans vs. ChatGPT. ACT Research. Issue Brief

Download full text

Kate E. Walton; Cristina Anguiano-Carrasco – ACT, Inc., 2024

Large language models (LLMs), such as ChatGPT, are becoming increasingly prominent. Their use is becoming more and more popular to assist with simple tasks, such as summarizing documents, translating languages, rephrasing sentences, or answering questions. Reports like McKinsey's (Chui, & Yee, 2023) estimate that by implementing LLMs,…

Descriptors: Artificial Intelligence, Man Machine Systems, Natural Language Processing, Test Construction

A Comparison of Latent Semantic Analysis and Latent Dirichlet Allocation in Educational Measurement

Peer reviewed

Direct link

Jordan M. Wheeler; Allan S. Cohen; Shiyu Wang – Journal of Educational and Behavioral Statistics, 2024

Topic models are mathematical and statistical models used to analyze textual data. The objective of topic models is to gain information about the latent semantic space of a set of related textual data. The semantic space of a set of textual data contains the relationship between documents and words and how they are used. Topic models are becoming…

Descriptors: Semantics, Educational Assessment, Evaluators, Reliability

Examining the Effect of Item Difficulty and Rater Leniency on Iranian Test Takers' Performance on WDCT and DSAT: A Comparative Study

Peer reviewed
PDF on ERIC

Download full text

Reza Shahi; Hamdollah Ravand; Golam Reza Rohani – International Journal of Language Testing, 2025

The current paper intends to exploit the Many Facet Rasch Model to investigate and compare the impact of situations (items) and raters on test takers' performance on the Written Discourse Completion Test (WDCT) and Discourse Self-Assessment Tests (DSAT). In this study, the participants were 110 English as a Foreign Language (EFL) students at…

Descriptors: Comparative Analysis, English (Second Language), Second Language Learning, Second Language Instruction

A New Scoring Method for Item Response Theory Analysis of C-Tests

Peer reviewed

Direct link

Farshad Effatpanah; Purya Baghaei; Mona Tabatabaee-Yazdi; Esmat Babaii – Language Testing, 2025

This study aimed to propose a new method for scoring C-Tests as measures of general language proficiency. In this approach, the unit of analysis is sentences rather than gaps or passages. That is, the gaps correctly reformulated in each sentence were aggregated as sentence score, and then each sentence was entered into the analysis as a polytomous…

Descriptors: Item Response Theory, Language Tests, Test Items, Test Construction

Estimating the Impact of Local Item Dependency in a Test of Second Language Reading Comprehension

Peer reviewed
PDF on ERIC

Download full text

Tim Stoeckel; Liang Ye Tan; Hung Tan Ha; Nam Thi Phuong Ho; Tomoko Ishii; Young Ae Kim; Chunmei Huang; Stuart McLean – Vocabulary Learning and Instruction, 2024

Local item dependency (LID) occurs when test-takers' responses to one test item are affected by their responses to another. It can be problematic if it causes inflated reliability estimates or distorted person and item measures. The cued-recall reading comprehension test in Hu and Nation's (2000) well-known and influential coverage--comprehension…

Descriptors: Reading Comprehension, English (Second Language), Second Language Instruction, Second Language Learning

The Concurrent Validity of Comparative Judgement Outcomes Compared with Marks

Download full text

Gill, Tim – Research Matters, 2022

In Comparative Judgement (CJ) exercises, examiners are asked to look at a selection of candidate scripts (with marks removed) and order them in terms of which they believe display the best quality. By including scripts from different examination sessions, the results of these exercises can be used to help with maintaining standards. Results from…

Descriptors: Comparative Analysis, Decision Making, Scripts, Standards

Treatments of Differential Item Functioning: A Comparison of Four Methods

Peer reviewed

Direct link

Liu, Xiaowen; Jane Rogers, H. – Educational and Psychological Measurement, 2022

Test fairness is critical to the validity of group comparisons involving gender, ethnicities, culture, or treatment conditions. Detection of differential item functioning (DIF) is one component of efforts to ensure test fairness. The current study compared four treatments for items that have been identified as showing DIF: deleting, ignoring,…

Descriptors: Item Analysis, Comparative Analysis, Culture Fair Tests, Test Validity

Reliability and Validity of Methods to Assess Undergraduate Healthcare Student Performance in Pharmacology: Comparison of Open Book versus Time-Limited Closed Book Examinations

Peer reviewed
PDF on ERIC

Download full text

David Bell; Vikki O'Neill; Vivienne Crawford – Practitioner Research in Higher Education, 2023

We compared the influence of open-book extended duration versus closed book time-limited format on reliability and validity of written assessments of pharmacology learning outcomes within our medical and dental courses. Our dental cohort undertake a mid-year test (30xfree-response short answer to a question, SAQ) and end-of-year paper (4xSAQ,…

Descriptors: Undergraduate Students, Pharmacology, Pharmaceutical Education, Test Format

Measuring Language Ability of Students with Compensatory Multidimensional CAT: A Post-Hoc Simulation Study

Peer reviewed

Direct link

Ozdemir, Burhanettin; Gelbal, Selahattin – Education and Information Technologies, 2022

The computerized adaptive tests (CAT) apply an adaptive process in which the items are tailored to individuals' ability scores. The multidimensional CAT (MCAT) designs differ in terms of different item selection, ability estimation, and termination methods being used. This study aims at investigating the performance of the MCAT designs used to…

Descriptors: Scores, Computer Assisted Testing, Test Items, Language Proficiency

Does Comparative Judgement of Scripts Provide an Effective Means of Maintaining Standards in Mathematics? Research Report

Download full text

Benton, Tom; Leech, Tony; Hughes, Sarah – Cambridge Assessment, 2020

In the context of examinations, the phrase "maintaining standards" usually refers to any activity designed to ensure that it is no easier (or harder) to achieve a given grade in one year than in another. Specifically, it tends to mean activities associated with setting examination grade boundaries. Benton et al (2020) describes a method…

Descriptors: Mathematics Tests, Equated Scores, Comparative Analysis, Difficulty Level

Changes in the Speed-Ability Relation through Different Treatments of Rapid Guessing

Peer reviewed

Direct link

Deribo, Tobias; Goldhammer, Frank; Kroehne, Ulf – Educational and Psychological Measurement, 2023

As researchers in the social sciences, we are often interested in studying not directly observable constructs through assessments and questionnaires. But even in a well-designed and well-implemented study, rapid-guessing behavior may occur. Under rapid-guessing behavior, a task is skimmed shortly but not read and engaged with in-depth. Hence, a…

Descriptors: Reaction Time, Guessing (Tests), Behavior Patterns, Bias

Developing the Diagnostic Test of Misconceptions of Fractions

Peer reviewed
PDF on ERIC

Download full text

Aleyna Altan; Zehra Taspinar Sener – Online Submission, 2023

This research aimed to develop a valid and reliable test to be used to detect sixth grade students' misconceptions and errors regarding the subject of fractions. A misconception diagnostic test has been developed that includes the concept of fractions, different representations of fractions, ordering and comparing fractions, equivalence of…

Descriptors: Diagnostic Tests, Mathematics Tests, Fractions, Misconceptions

Item-Score Reliability in Empirical-Data Sets and Its Relationship with Other Item Indices

Peer reviewed

Direct link

Zijlmans, Eva A. O.; Tijmstra, Jesper; van der Ark, L. Andries; Sijtsma, Klaas – Educational and Psychological Measurement, 2018

Reliability is usually estimated for a total score, but it can also be estimated for item scores. Item-score reliability can be useful to assess the repeatability of an individual item score in a group. Three methods to estimate item-score reliability are discussed, known as method MS, method [lambda][subscript 6], and method CA. The item-score…

Descriptors: Test Items, Test Reliability, Correlation, Comparative Analysis

Previous Page | Next Page »

Pages: 1 | 2 | 3 | 4 | 5 | 6 | 7 | 8 | 9 | 10

Educational and Psychological…	11
ETS Research Report Series	8
Journal of Educational…	6
Online Submission	5
Advances in Health Sciences…	3
ProQuest LLC	3
Applied Measurement in…	2
Educational Research and…	2
International Journal of…	2
Language Assessment Quarterly	2
Language Testing	2
ACT, Inc.	1
Anatolian Journal of Education	1
Applied Psychological…	1
Asia Pacific Education Review	1
Assessment & Evaluation in…	1
Assessment in Education:…	1
Biochemistry and Molecular…	1
British Journal of Language…	1
Cambridge Assessment	1
Center for Research on…	1
Cogent Education	1
College Board	1
College Student Journal	1
Communique	1
More ▼

Benson, Jeri	3
Guo, Hongwen	2
Hung Tan Ha	2
Kim, Sooyeon	2
Lunz, Mary E.	2
Reckase, Mark D.	2
Tim Stoeckel	2
Abel, Michael B.	1
Ackerman, Terry A.	1
Afflerbach, Peter	1
Ahmed, Tamim	1
Akbari, Alireza	1
Aktas, Elif	1
Aleyna Altan	1
Aliyu, Hassan	1
Allan S. Cohen	1
Almehrizi, Rashid S.	1
Alpayar, Cagla	1
Alqarni, Abdulelah Mohammed	1
Alvaro, Rosaria	1
Anwyll, Steve	1
Arth, Thomas O.	1
Attali, Yigal	1
Baghaei, Purya	1
More ▼

SAT (College Admission Test)	4
Program for International…	2
ACT Assessment	1
Armed Services Vocational…	1
Comprehensive Tests of Basic…	1
Defining Issues Test	1
Early Childhood Environment…	1
Embedded Figures Test	1
International Association for…	1
International English…	1
Iowa Tests of Basic Skills	1
National Assessment of…	1
Peabody Individual…	1
Peabody Picture Vocabulary…	1
Progress in International…	1
School and College Ability…	1
Strong Interest Inventory	1
Test of English as a Foreign…	1
Trends in International…	1
Work Keys (ACT)	1
More ▼