ERIC - Search Results

Publication Date

In 2025	2
Since 2024	2
Since 2021 (last 5 years)	4
Since 2016 (last 10 years)	15
Since 2006 (last 20 years)	38

Descriptor

Comparative Analysis	43
Item Response Theory	43
Test Reliability	43
Test Items	21
Test Validity	19
Scores	16
Computer Assisted Testing	10
Correlation	10
Psychometrics	10
Foreign Countries	9
Test Format	9
Scoring	8
Statistical Analysis	8
Test Theory	8
Difficulty Level	7
Models	7
Test Construction	6
Item Analysis	5
Language Tests	5
Simulation	5
Test Bias	5
Accuracy	4
Computation	4
Construct Validity	4
Elementary School Students	4
More ▼

Publication Type

Journal Articles	35
Reports - Research	30
Reports - Evaluative	6
Reports - Descriptive	4
Speeches/Meeting Papers	4
Dissertations/Theses -…	1
Guides - Non-Classroom	1
Non-Print Media	1
Numerical/Quantitative Data	1
Reference Materials - General	1
Tests/Questionnaires	1
More ▼

Education Level

Higher Education	7
Postsecondary Education	6
Elementary Education	3
High Schools	3
Secondary Education	2
Early Childhood Education	1
Grade 10	1
Grade 11	1
Grade 12	1
Kindergarten	1

Audience

Location

Hong Kong	2
Taiwan	2
United States	2
Australia	1
California	1
China	1
France	1
Indonesia	1
Iran	1
Japan	1
New York	1
Portugal	1
Texas	1
United Kingdom	1
Washington	1
More ▼

Laws, Policies, & Programs

Assessments and Surveys

Early Childhood Longitudinal…	2
ACT Assessment	1
Defining Issues Test	1
Iowa Tests of Basic Skills	1
SAT (College Admission Test)	1

What Works Clearinghouse Rating

Showing 1 to 15 of 43 results Save | Export

Comparing and Combining IRTree Models and Anchoring Vignettes in Addressing Response Styles

Peer reviewed

Direct link

Mingfeng Xue; Ping Chen – Journal of Educational Measurement, 2025

Response styles pose great threats to psychological measurements. This research compares IRTree models and anchoring vignettes in addressing response styles and estimating the target traits. It also explores the potential of combining them at the item level and total-score level (ratios of extreme and middle responses to vignettes). Four models…

Descriptors: Item Response Theory, Models, Comparative Analysis, Vignettes

A New Scoring Method for Item Response Theory Analysis of C-Tests

Peer reviewed

Direct link

Farshad Effatpanah; Purya Baghaei; Mona Tabatabaee-Yazdi; Esmat Babaii – Language Testing, 2025

This study aimed to propose a new method for scoring C-Tests as measures of general language proficiency. In this approach, the unit of analysis is sentences rather than gaps or passages. That is, the gaps correctly reformulated in each sentence were aggregated as sentence score, and then each sentence was entered into the analysis as a polytomous…

Descriptors: Item Response Theory, Language Tests, Test Items, Test Construction

Treatments of Differential Item Functioning: A Comparison of Four Methods

Peer reviewed

Direct link

Liu, Xiaowen; Jane Rogers, H. – Educational and Psychological Measurement, 2022

Test fairness is critical to the validity of group comparisons involving gender, ethnicities, culture, or treatment conditions. Detection of differential item functioning (DIF) is one component of efforts to ensure test fairness. The current study compared four treatments for items that have been identified as showing DIF: deleting, ignoring,…

Descriptors: Item Analysis, Comparative Analysis, Culture Fair Tests, Test Validity

Item Response Theory, Computer Adaptive Testing and the Risk of Self-Deception

Download full text

Benton, Tom – Research Matters, 2021

Computer adaptive testing is intended to make assessment more reliable by tailoring the difficulty of the questions a student has to answer to their level of ability. Most commonly, this benefit is used to justify the length of tests being shortened whilst retaining the reliability of a longer, non-adaptive test. Improvements due to adaptive…

Descriptors: Risk, Item Response Theory, Computer Assisted Testing, Difficulty Level

Reliably Assessing Growth with Longitudinal Diagnostic Classification Models

Peer reviewed

Direct link

Madison, Matthew J. – Educational Measurement: Issues and Practice, 2019

Recent advances have enabled diagnostic classification models (DCMs) to accommodate longitudinal data. These longitudinal DCMs were developed to study how examinees change, or transition, between different attribute mastery statuses over time. This study examines using longitudinal DCMs as an approach to assessing growth and serves three purposes:…

Descriptors: Longitudinal Studies, Item Response Theory, Psychometrics, Criterion Referenced Tests

A Design for Comparing CTT and IRT in Test Assembly, Scoring and Argumentation: Differences among Reliability, Information and Validation

Peer reviewed

Direct link

Alqarni, Abdulelah Mohammed – Journal on Educational Psychology, 2019

This study compares the psychometric properties of reliability in Classical Test Theory (CTT), item information in Item Response Theory (IRT), and validation from the perspective of modern validity theory for the purpose of bringing attention to potential issues that might exist when testing organizations use both test theories in the same testing…

Descriptors: Test Theory, Item Response Theory, Test Construction, Scoring

An Adaptive Test Analysis Based on Students' Motivation

Peer reviewed
PDF on ERIC

Download full text

Yoshioka, Sérgio R. I.; Ishitani, Lucila – Informatics in Education, 2018

Computerized Adaptive Testing (CAT) is now widely used. However, inserting new items into the question bank of a CAT requires a great effort that makes impractical the wide application of CAT in classroom teaching. One solution would be to use the tacit knowledge of the teachers or experts for a pre-classification and calibrate during the…

Descriptors: Student Motivation, Adaptive Testing, Computer Assisted Testing, Item Response Theory

Same Test, Better Scores: Boosting the Reliability of Short Online Intelligence Recruitment Tests with Nested Logit Item Response Theory Models

Peer reviewed
PDF on ERIC

Download full text

Storme, Martin; Myszkowski, Nils; Baron, Simon; Bernard, David – Journal of Intelligence, 2019

Assessing job applicants' general mental ability online poses psychometric challenges due to the necessity of having brief but accurate tests. Recent research (Myszkowski & Storme, 2018) suggests that recovering distractor information through Nested Logit Models (NLM; Suh & Bolt, 2010) increases the reliability of ability estimates in…

Descriptors: Intelligence Tests, Item Response Theory, Comparative Analysis, Test Reliability

Distractor Analysis for Multiple-Choice Tests: An Empirical Study with International Language Assessment Data. Research Report. ETS RR-19-39

Peer reviewed
PDF on ERIC

Download full text

Haberman, Shelby J.; Liu, Yang; Lee, Yi-Hsuan – ETS Research Report Series, 2019

Distractor analyses are routinely conducted in educational assessments with multiple-choice items. In this research report, we focus on three item response models for distractors: (a) the traditional nominal response (NR) model, (b) a combination of a two-parameter logistic model for item scores and a NR model for selections of incorrect…

Descriptors: Multiple Choice Tests, Scores, Test Reliability, High Stakes Tests

A Comparison of Reliability and Precision of Subscore Reporting Methods for a State English Language Proficiency Assessment

Peer reviewed

Direct link

Longabach, Tanya; Peyton, Vicki – Language Testing, 2018

K-12 English language proficiency tests that assess multiple content domains (e.g., listening, speaking, reading, writing) often have subsections based on these content domains; scores assigned to these subsections are commonly known as subscores. Testing programs face increasing customer demands for the reporting of subscores in addition to the…

Descriptors: Comparative Analysis, Test Reliability, Second Language Learning, Language Proficiency

Evaluation of Different Scoring Rules for a Noncognitive Test in Development. Research Report. ETS RR-16-03

Peer reviewed
PDF on ERIC

Download full text

Guo, Hongwen; Zu, Jiyun; Kyllonen, Patrick; Schmitt, Neal – ETS Research Report Series, 2016

In this report, systematic applications of statistical and psychometric methods are used to develop and evaluate scoring rules in terms of test reliability. Data collected from a situational judgment test are used to facilitate the comparison. For a well-developed item with appropriate keys (i.e., the correct answers), agreement among various…

Descriptors: Scoring, Test Reliability, Statistical Analysis, Psychometrics

Stepping Outside the Normed Sample: Implications for Validity

Peer reviewed

Direct link

Hays, Danica G.; Wood, Chris – Measurement and Evaluation in Counseling and Development, 2017

We present considerations for validity when a population outside of a normed sample is assessed and those data are interpreted. Using a career group counseling example exploring life satisfaction changes as evidenced by the Quality of Life Inventory (Frisch, 1994), we showcase qualitative and quantitative approaches to explore how normative data…

Descriptors: Data Interpretation, Scores, Quality of Life, Life Satisfaction

Item Response Theory for Peer Assessment

Peer reviewed

Direct link

Uto, Masaki; Ueno, Maomi – IEEE Transactions on Learning Technologies, 2016

As an assessment method based on a constructivist approach, peer assessment has become popular in recent years. However, in peer assessment, a problem remains that reliability depends on the rater characteristics. For this reason, some item response models that incorporate rater parameters have been proposed. Those models are expected to improve…

Descriptors: Item Response Theory, Peer Evaluation, Bayesian Statistics, Simulation

A Comparison of Three Test Formats to Assess Word Difficulty

Peer reviewed

Direct link

Culligan, Brent – Language Testing, 2015

This study compared three common vocabulary test formats, the Yes/No test, the Vocabulary Knowledge Scale (VKS), and the Vocabulary Levels Test (VLT), as measures of vocabulary difficulty. Vocabulary difficulty was defined as the item difficulty estimated through Item Response Theory (IRT) analysis. Three tests were given to 165 Japanese students,…

Descriptors: Language Tests, Test Format, Comparative Analysis, Vocabulary

The Comparison of Accuracy Scores on the Paper and Pencil Testing vs. Computer-Based Testing

Peer reviewed
PDF on ERIC

Download full text

Retnawati, Heri – Turkish Online Journal of Educational Technology - TOJET, 2015

This study aimed to compare the accuracy of the test scores as results of Test of English Proficiency (TOEP) based on paper and pencil test (PPT) versus computer-based test (CBT). Using the participants' responses to the PPT documented from 2008-2010 and data of CBT TOEP documented in 2013-2014 on the sets of 1A, 2A, and 3A for the Listening and…

Descriptors: Scores, Accuracy, Computer Assisted Testing, English (Second Language)

Previous Page | Next Page »

Pages: 1 | 2 | 3

ETS Research Report Series	5
Language Testing	3
Educational and Psychological…	2
Journal of Educational…	2
Measurement and Evaluation in…	2
Applied Measurement in…	1
Asia Pacific Education Review	1
Cogent Education	1
College Board	1
Communique	1
Educational Measurement:…	1
IEEE Transactions on Learning…	1
Informatics in Education	1
International Research in…	1
Journal of Career Assessment	1
Journal of Counseling…	1
Journal of Intelligence	1
Journal of Interactive Online…	1
Journal on Educational…	1
Language Assessment Quarterly	1
National Center for Education…	1
Online Submission	1
ProQuest LLC	1
Psychometrika	1
ReCALL	1
More ▼

Lee, Yi-Hsuan	2
Yao, Lihua	2
Alqarni, Abdulelah Mohammed	1
Baghi, Heibatollah	1
Baron, Simon	1
Beaujean, A. Alexander	1
Benton, Tom	1
Berman, Ye'Elah	1
Bernard, David	1
Blaker, Lisa	1
Brennan, Robert L.	1
Carvajal, Jorge	1
Casillas, Alex	1
Chance, Beth	1
Church, A. Timothy	1
Coniam, David	1
Crehan, Kevin D.	1
Culligan, Brent	1
Esmat Babaii	1
Farshad Effatpanah	1
Faulkner, Joanne	1
Feldt, Leonard S.	1
Ferrara, Steven F.	1
Garfield, Joan	1
More ▼