ERIC - Search Results

Publication Date

In 2025	0
Since 2024	0
Since 2021 (last 5 years)	2
Since 2016 (last 10 years)	6
Since 2006 (last 20 years)	16

Descriptor

Error of Measurement	25
Evaluation Methods	25
Test Items	25
Item Response Theory	12
Simulation	10
Test Bias	9
Measurement Techniques	5
Psychometrics	5
Regression (Statistics)	5
Sample Size	5
Scores	5
Statistical Analysis	5
Computer Assisted Testing	4
Foreign Countries	4
Comparative Analysis	3
Correlation	3
Difficulty Level	3
Educational Assessment	3
Educational Research	3
Equated Scores	3
Item Analysis	3
Measurement	3
Models	3
Student Evaluation	3
Accuracy	2
More ▼

Source

Educational and Psychological…	6
ETS Research Report Series	2
Educational Measurement:…	2
Journal of Educational…	2
Applied Measurement in…	1
Applied Psychological…	1
Assessment & Evaluation in…	1
Educational Testing Service	1
International Journal of…	1
International Journal of…	1
Journal of Educational and…	1
Online Submission	1
ProQuest LLC	1
Structural Equation Modeling:…	1
Teachers College Record	1
More ▼

Publication Type

Journal Articles	20
Reports - Research	14
Reports - Evaluative	6
Speeches/Meeting Papers	4
Reports - Descriptive	3
Dissertations/Theses -…	1
Numerical/Quantitative Data	1
Opinion Papers	1
Tests/Questionnaires	1

Education Level

Secondary Education	2
Elementary Education	1
Elementary Secondary Education	1
Higher Education	1
Junior High Schools	1
Middle Schools	1
Postsecondary Education	1

Audience

Researchers

Location

Portugal	1
Taiwan	1
Turkey	1

Laws, Policies, & Programs

Assessments and Surveys

National Assessment of…	1
Program for International…	1
Trends in International…	1

What Works Clearinghouse Rating

Showing 1 to 15 of 25 results Save | Export

Comparison of Kernel Equating Methods under NEAT and NEC Designs

Peer reviewed
PDF on ERIC

Download full text

Ozsoy, Seyma Nur; Kilmen, Sevilay – International Journal of Assessment Tools in Education, 2023

In this study, Kernel test equating methods were compared under NEAT and NEC designs. In NEAT design, Kernel post-stratification and chain equating methods taking into account optimal and large bandwidths were compared. In the NEC design, gender and/or computer/tablet use was considered as a covariate, and Kernel test equating methods were…

Descriptors: Equated Scores, Testing, Test Items, Statistical Analysis

It's Not Just Angoff: Misperceptions of Hard and Easy Items in Bookmark-Type Ratings

Peer reviewed

Direct link

Wyse, Adam E.; Babcock, Ben – Educational Measurement: Issues and Practice, 2020

A common belief is that the Bookmark method is a cognitively simpler standard-setting method than the modified Angoff method. However, a limited amount of research has investigated panelist's ability to perform well the Bookmark method, and whether some of the challenges panelists face with the Angoff method may also be present in the Bookmark…

Descriptors: Standard Setting (Scoring), Evaluation Methods, Testing Problems, Test Items

A Log-Linear Modeling Approach for Differential Item Functioning Detection in Polytomously Scored Items

Peer reviewed

Direct link

Yesiltas, Gonca; Paek, Insu – Educational and Psychological Measurement, 2020

A log-linear model (LLM) is a well-known statistical method to examine the relationship among categorical variables. This study investigated the performance of LLM in detecting differential item functioning (DIF) for polytomously scored items via simulations where various sample sizes, ability mean differences (impact), and DIF types were…

Descriptors: Simulation, Sample Size, Item Analysis, Scores

Differential Item Functioning Effect Size from the Multigroup Confirmatory Factor Analysis for a Meta-Analysis: A Simulation Study

Peer reviewed

Direct link

Park, Sung Eun; Ahn, Soyeon; Zopluoglu, Cengiz – Educational and Psychological Measurement, 2021

This study presents a new approach to synthesizing differential item functioning (DIF) effect size: First, using correlation matrices from each study, we perform a multigroup confirmatory factor analysis (MGCFA) that examines measurement invariance of a test item between two subgroups (i.e., focal and reference groups). Then we synthesize, across…

Descriptors: Item Analysis, Effect Size, Difficulty Level, Monte Carlo Methods

A Modified "a"-Stratified Method for Computerized Adaptive Testing. Research Report. ETS RR-19-10

Peer reviewed
PDF on ERIC

Download full text

Gu, Lixiong; Ling, Guangming; Qu, Yanxuan – ETS Research Report Series, 2019

Research has found that the "a"-stratified item selection strategy (STR) for computerized adaptive tests (CATs) may lead to insufficient use of high a items at later stages of the tests and thus to reduced measurement precision. A refined approach, unequal item selection across strata (USTR), effectively improves test precision over the…

Descriptors: Computer Assisted Testing, Adaptive Testing, Test Use, Test Items

Differential Item Functioning Detection with the Mantel-Haenszel Procedure: The Effects of Matching Types and Other Factors

Peer reviewed

Direct link

Socha, Alan; DeMars, Christine E.; Zilberberg, Anna; Phan, Ha – International Journal of Testing, 2015

The Mantel-Haenszel (MH) procedure is commonly used to detect items that function differentially for groups of examinees from various demographic and linguistic backgrounds--for example, in international assessments. As in some other DIF methods, the total score is used to match examinees on ability. In thin matching, each of the total score…

Descriptors: Test Items, Educational Testing, Evaluation Methods, Ability Grouping

A Comparison of Linking Methods for Estimating National Trends in International Comparative Large-Scale Assessments in the Presence of Cross-national DIF

Peer reviewed

Direct link

Sachse, Karoline A.; Roppelt, Alexander; Haag, Nicole – Journal of Educational Measurement, 2016

Trend estimation in international comparative large-scale assessments relies on measurement invariance between countries. However, cross-national differential item functioning (DIF) has been repeatedly documented. We ran a simulation study using national item parameters, which required trends to be computed separately for each country, to compare…

Descriptors: Comparative Analysis, Measurement, Test Bias, Simulation

The DIF-Free-Then-DIF Strategy for the Assessment of Differential Item Functioning

Peer reviewed

Direct link

Wang, Wen-Chung; Shih, Ching-Lin; Sun, Guo-Wei – Educational and Psychological Measurement, 2012

The DIF-free-then-DIF (DFTD) strategy consists of two steps: (a) select a set of items that are the most likely to be DIF-free and (b) assess the other items for DIF (differential item functioning) using the designated items as anchors. The rank-based method together with the computer software IRTLRDIF can select a set of DIF-free polytomous items…

Descriptors: Test Bias, Test Items, Item Response Theory, Evaluation Methods

Improving Explanatory Inferences from Assessments

Direct link

Diakow, Ronli Phyllis – ProQuest LLC, 2013

This dissertation comprises three papers that propose, discuss, and illustrate models to make improved inferences about research questions regarding student achievement in education. Addressing the types of questions common in educational research today requires three different "extensions" to traditional educational assessment: (1)…

Descriptors: Inferences, Educational Assessment, Academic Achievement, Educational Research

Impact of Missing Data on the Detection of Differential Item Functioning: The Case of Mantel-Haenszel and Logistic Regression Analysis

Peer reviewed

Direct link

Robitzsch, Alexander; Rupp, Andre A. – Educational and Psychological Measurement, 2009

This article describes the results of a simulation study to investigate the impact of missing data on the detection of differential item functioning (DIF). Specifically, it investigates how four methods for dealing with missing data (listwise deletion, zero imputation, two-way imputation, response function imputation) interact with two methods of…

Descriptors: Test Bias, Simulation, Interaction, Effect Size

Creating IRT-Based Parallel Test Forms Using the Genetic Algorithm Method

Peer reviewed

Direct link

Sun, Koun-Tem; Chen, Yu-Jen; Tsai, Shu-Yen; Cheng, Chien-Fen – Applied Measurement in Education, 2008

In educational measurement, the construction of parallel test forms is often a combinatorial optimization problem that involves the time-consuming selection of items to construct tests having approximately the same test information functions (TIFs) and constraints. This article proposes a novel method, genetic algorithm (GA), to construct parallel…

Descriptors: Test Format, Measurement Techniques, Equations (Mathematics), Item Response Theory

Avoiding and Correcting Bias in Score-Based Latent Variable Regression with Discrete Manifest Items

Peer reviewed

Direct link

Lu, Irene R. R.; Thomas, D. Roland – Structural Equation Modeling: A Multidisciplinary Journal, 2008

This article considers models involving a single structural equation with latent explanatory and/or latent dependent variables where discrete items are used to measure the latent variables. Our primary focus is the use of scores as proxies for the latent variables and carrying out ordinary least squares (OLS) regression on such scores to estimate…

Descriptors: Least Squares Statistics, Computation, Item Response Theory, Structural Equation Models

E-Assessment within the Bologna Paradigm: Evidence from Portugal

Peer reviewed

Direct link

Ferrao, Maria – Assessment & Evaluation in Higher Education, 2010

The Bologna Declaration brought reforms into higher education that imply changes in teaching methods, didactic materials and textbooks, infrastructures and laboratories, etc. Statistics and mathematics are disciplines that traditionally have the worst success rates, particularly in non-mathematics core curricula courses. This research project,…

Descriptors: Foreign Countries, Computer Assisted Testing, Educational Technology, Educational Assessment

Rejoinder: Evaluating Standard Setting Methods Using Error Models Proposed by Schulz

Peer reviewed

Direct link

Reckase, Mark D. – Educational Measurement: Issues and Practice, 2006

Schulz (2006) provides a different perspective on standard setting than that provided in Reckase (2006). He also suggests a modification to the bookmark procedure and some alternative models for errors in panelists' judgments than those provided by Reckase. This article provides a response to some of the points made by Schulz and reports some…

Descriptors: Evaluation Methods, Standard Setting, Reader Response, Regression (Statistics)

Evaluating the Magnitude of Differential Item Functioning in Polytomous Items.

Peer reviewed

Zwick, Rebecca; Thayer, Dorothy T. – Journal of Educational and Behavioral Statistics, 1996

Two possible standard error formulas for the polytomous differential item functioning index proposed by N. J. Dorans and A. P. Schmitt (1991) were derived. These standard errors, and associated hypothesis-testing procedures, were evaluated through simulated data. The standard error that performed better is based on N. Mantel's (1963)…

Descriptors: Error of Measurement, Evaluation Methods, Hypothesis Testing, Item Bias

Previous Page | Next Page »

Pages: 1 | 2

Thayer, Dorothy T.	2
Wang, Wen-Chung	2
Zwick, Rebecca	2
Abedi, Jamal	1
Ahn, Soyeon	1
Ankenmann, Robert D.	1
Aylesworth, Richard	1
Babcock, Ben	1
Chen, Yu-Jen	1
Cheng, Chien-Fen	1
Cook, Linda L.	1
Davey, Tim	1
DeMars, Christine E.	1
Diakow, Ronli Phyllis	1
Doran, Harold C.	1
Ferrao, Maria	1
Gu, Lixiong	1
Haag, Nicole	1
Herbert, Erin	1
Holland, Paul	1
Karkee, Thakur B.	1
Kilmen, Sevilay	1
Kristjansson, Elizabeth	1
Ling, Guangming	1
Lu, Irene R. R.	1
More ▼