ERIC - Search Results

Publication Date

In 2026	0
Since 2025	0
Since 2022 (last 5 years)	2
Since 2017 (last 10 years)	5
Since 2007 (last 20 years)	15

Descriptor

Comparative Analysis	22
Probability	22
Test Items	22
Item Response Theory	11
Difficulty Level	9
Simulation	9
Test Bias	6
Ability	5
Computer Assisted Testing	5
Equations (Mathematics)	5
Computation	4
Models	4
Scores	4
Statistical Analysis	4
Adaptive Testing	3
Classification	3
Evaluation Methods	3
Item Analysis	3
Item Banks	3
Mathematical Models	3
Monte Carlo Methods	3
Psychometrics	3
Sample Size	3
Scoring	3
Test Construction	3
More ▼

Source

Applied Measurement in…	3
ETS Research Report Series	2
Educational and Psychological…	2
International Journal of…	2
Practical Assessment,…	2
Applied Psychological…	1
Educational Sciences: Theory…	1
Eurasian Journal of…	1
Hacettepe University Journal…	1
Journal of Educational…	1
Language Testing in Asia	1
Quality Assurance in…	1
More ▼

Publication Type

Journal Articles	18
Reports - Research	14
Reports - Evaluative	7
Speeches/Meeting Papers	2
Reports - Descriptive	1

Education Level

Higher Education	2
Elementary Education	1
Elementary Secondary Education	1
Grade 11	1
Grade 12	1
Grade 4	1
High Schools	1
Intermediate Grades	1
Postsecondary Education	1
Secondary Education	1

Audience

Location

Canada	1
Turkey	1

Laws, Policies, & Programs

Assessments and Surveys

Graduate Record Examinations	1
National Assessment of…	1
Program for International…	1
Trends in International…	1

What Works Clearinghouse Rating

Showing 1 to 15 of 22 results Save | Export

Maintaining Score Scales over Time: A Comparison of Five Scoring Methods

Peer reviewed

Direct link

Kim, Stella Yun; Lee, Won-Chan – Applied Measurement in Education, 2023

This study evaluates various scoring methods including number-correct scoring, IRT theta scoring, and hybrid scoring in terms of scale-score stability over time. A simulation study was conducted to examine the relative performance of five scoring methods in terms of preserving the first two moments of scale scores for a population in a chain of…

Descriptors: Scoring, Comparative Analysis, Item Response Theory, Simulation

Closed Formula of Test Length Required for Adaptive Testing with Medium Probability of Solution

Peer reviewed

Direct link

Kárász, Judit T.; Széll, Krisztián; Takács, Szabolcs – Quality Assurance in Education: An International Perspective, 2023

Purpose: Based on the general formula, which depends on the length and difficulty of the test, the number of respondents and the number of ability levels, this study aims to provide a closed formula for the adaptive tests with medium difficulty (probability of solution is p = 1/2) to determine the accuracy of the parameters for each item and in…

Descriptors: Test Length, Probability, Comparative Analysis, Difficulty Level

Calibrated Parsing Items Evaluation: A Step towards Objectifying the Translation Assessment

Peer reviewed

Direct link

Akbari, Alireza; Shahnazari, Mohammadtaghi – Language Testing in Asia, 2019

The present research paper introduces a translation evaluation method called Calibrated Parsing Items Evaluation (CPIE hereafter). This evaluation method maximizes translators' performance through identifying the parsing items with an optimal p-docimology and d-index (item discrimination). This method checks all the possible parses (annotations)…

Descriptors: Test Items, Translation, Computer Software, Evaluators

An Algorithm to Improve Test Answer Copying Detection Using the Omega Statistic

Peer reviewed

Direct link

Maeda, Hotaka; Zhang, Bo – International Journal of Testing, 2017

The omega (?) statistic is reputed to be one of the best indices for detecting answer copying on multiple choice tests, but its performance relies on the accurate estimation of copier ability, which is challenging because responses from the copiers may have been contaminated. We propose an algorithm that aims to identify and delete the suspected…

Descriptors: Cheating, Test Items, Mathematics, Statistics

An Exploratory Analysis of Differential Item Functioning and Its Possible Sources in a Higher Education Admissions Context

Peer reviewed

Direct link

Oliveri, Maria Elena; Lawless, Rene; Robin, Frederic; Bridgeman, Brent – Applied Measurement in Education, 2018

We analyzed a pool of items from an admissions test for differential item functioning (DIF) for groups based on age, socioeconomic status, citizenship, or English language status using Mantel-Haenszel and item response theory. DIF items were systematically examined to identify its possible sources by item type, content, and wording. DIF was…

Descriptors: Test Bias, Comparative Analysis, Item Banks, Item Response Theory

Investigating Causal DIF via Propensity Score Methods

Peer reviewed
PDF on ERIC

Download full text

Liu, Yan; Zumbo, Bruno D.; Gustafson, Paul; Huang, Yi; Kroc, Edward; Wu, Amery D. – Practical Assessment, Research & Evaluation, 2016

A variety of differential item functioning (DIF) methods have been proposed and used for ensuring that a test is fair to all test takers in a target population in the situations of, for example, a test being translated to other languages. However, once a method flags an item as DIF, it is difficult to conclude that the grouping variable (e.g.,…

Descriptors: Test Items, Test Bias, Probability, Scores

A Comparison of Bookmark and Angoff Standard Setting Methods

Peer reviewed
PDF on ERIC

Download full text

Çetin, Sevda; Gelbal, Selahattin – Educational Sciences: Theory and Practice, 2013

In this research, the cut score of a foundation university was re-calculated with bookmark method and with Angoff method, each of which is a standard setting method; and the cut scores found were compared with the current proficiency score. Thus, the final cut score was found to be 27.87 with the cooperative work of 17 experts through the Angoff…

Descriptors: Standard Setting (Scoring), Comparative Analysis, Cutting Scores, Correlation

Ability Level Estimation of Students on Probability Unit via Computerized Adaptive Testing

Peer reviewed
PDF on ERIC

Download full text

Özyurt, Hacer; Özyurt, Özcan – Eurasian Journal of Educational Research, 2015

Problem Statement: Learning-teaching activities bring along the need to determine whether they achieve their goals. Thus, multiple choice tests addressing the same set of questions to all are frequently used. However, this traditional assessment and evaluation form contrasts with modern education, where individual learning characteristics are…

Descriptors: Probability, Adaptive Testing, Computer Assisted Testing, Item Response Theory

An Analytic Comparison of Effect Sizes for Differential Item Functioning

Peer reviewed

Direct link

Demars, Christine E. – Applied Measurement in Education, 2011

Three types of effects sizes for DIF are described in this exposition: log of the odds-ratio (differences in log-odds), differences in probability-correct, and proportion of variance accounted for. Using these indices involves conceptualizing the degree of DIF in different ways. This integrative review discusses how these measures are impacted in…

Descriptors: Effect Size, Test Bias, Probability, Difficulty Level

Formulating the Rasch Differential Item Functioning Model under the Marginal Maximum Likelihood Estimation Context and Its Comparison with Mantel-Haenszel Procedure in Short Test and Small Sample Conditions

Peer reviewed

Direct link

Paek, Insu; Wilson, Mark – Educational and Psychological Measurement, 2011

This study elaborates the Rasch differential item functioning (DIF) model formulation under the marginal maximum likelihood estimation context. Also, the Rasch DIF model performance was examined and compared with the Mantel-Haenszel (MH) procedure in small sample and short test length conditions through simulations. The theoretically known…

Descriptors: Test Bias, Test Length, Statistical Inference, Geometric Concepts

Termination Criteria for Computerized Classification Testing

Peer reviewed

Direct link

Thompson, Nathan A. – Practical Assessment, Research & Evaluation, 2011

Computerized classification testing (CCT) is an approach to designing tests with intelligent algorithms, similar to adaptive testing, but specifically designed for the purpose of classifying examinees into categories such as "pass" and "fail." Like adaptive testing for point estimation of ability, the key component is the…

Descriptors: Adaptive Testing, Computer Assisted Testing, Classification, Probability

Computerized Classification Testing under the One-Parameter Logistic Response Model with Ability-Based Guessing

Peer reviewed

Direct link

Wang, Wen-Chung; Huang, Sheng-Yun – Educational and Psychological Measurement, 2011

The one-parameter logistic model with ability-based guessing (1PL-AG) has been recently developed to account for effect of ability on guessing behavior in multiple-choice items. In this study, the authors developed algorithms for computerized classification testing under the 1PL-AG and conducted a series of simulations to evaluate their…

Descriptors: Computer Assisted Testing, Classification, Item Analysis, Probability

Comparison of IRT Likelihood Ratio Test and Logistic Regression DIF Detection Procedures

Peer reviewed

Direct link

Atar, Burcu; Kamata, Akihito – Hacettepe University Journal of Education, 2011

The Type I error rates and the power of IRT likelihood ratio test and cumulative logit ordinal logistic regression procedures in detecting differential item functioning (DIF) for polytomously scored items were investigated in this Monte Carlo simulation study. For this purpose, 54 simulation conditions (combinations of 3 sample sizes, 2 sample…

Descriptors: Test Bias, Sample Size, Monte Carlo Methods, Item Response Theory

Differential Item Functioning Analysis Using Rasch Item Information Functions

Peer reviewed

Direct link

Wyse, Adam E.; Mapuranga, Raymond – International Journal of Testing, 2009

Differential item functioning (DIF) analysis is a statistical technique used for ensuring the equity and fairness of educational assessments. This study formulates a new DIF analysis method using the information similarity index (ISI). ISI compares item information functions when data fits the Rasch model. Through simulations and an international…

Descriptors: Test Bias, Evaluation Methods, Test Items, Educational Assessment

Linking for the General Diagnostic Model. Research Report. ETS RR-08-08

Peer reviewed
PDF on ERIC

Download full text

Xu, Xueli; von Davier, Matthias – ETS Research Report Series, 2008

Three strategies for linking two consecutive assessments are investigated and compared by analyzing reading data for the National Assessment of Educational Progress (NAEP) using the general diagnostic model. These strategies are compared in terms of marginal and joint expectations of skills, joint probabilities of skill patterns, and item…

Descriptors: National Competency Tests, Probability, Reading Achievement, Test Items

Previous Page | Next Page »

Pages: 1 | 2

Akbari, Alireza	1
Atar, Burcu	1
Beretvas, S. Natasha	1
Bridgeman, Brent	1
Camilli, Gregory	1
Chang, Hua-Hua	1
Demars, Christine E.	1
Fluke, Rickey	1
Gelbal, Selahattin	1
Gustafson, Paul	1
Huang, Sheng-Yun	1
Huang, Yi	1
Kamata, Akihito	1
Kim, Stella Yun	1
Kroc, Edward	1
Kárász, Judit T.	1
Lawless, Rene	1
Lee, Won-Chan	1
Liu, Yan	1
Lord, Frederic M.	1
Lunz, Mary E.	1
Maeda, Hotaka	1
Mapuranga, Raymond	1
O'Neill, Thomas R.	1
Oliveri, Maria Elena	1
More ▼