Publication Date
In 2025 | 0 |
Since 2024 | 0 |
Since 2021 (last 5 years) | 2 |
Since 2016 (last 10 years) | 6 |
Since 2006 (last 20 years) | 16 |
Descriptor
Error of Measurement | 25 |
Evaluation Methods | 25 |
Test Items | 25 |
Item Response Theory | 12 |
Simulation | 10 |
Test Bias | 9 |
Measurement Techniques | 5 |
Psychometrics | 5 |
Regression (Statistics) | 5 |
Sample Size | 5 |
Scores | 5 |
More ▼ |
Source
Author
Thayer, Dorothy T. | 2 |
Wang, Wen-Chung | 2 |
Zwick, Rebecca | 2 |
Abedi, Jamal | 1 |
Ahn, Soyeon | 1 |
Ankenmann, Robert D. | 1 |
Aylesworth, Richard | 1 |
Babcock, Ben | 1 |
Chen, Yu-Jen | 1 |
Cheng, Chien-Fen | 1 |
Cook, Linda L. | 1 |
More ▼ |
Publication Type
Journal Articles | 20 |
Reports - Research | 14 |
Reports - Evaluative | 6 |
Speeches/Meeting Papers | 4 |
Reports - Descriptive | 3 |
Dissertations/Theses -… | 1 |
Numerical/Quantitative Data | 1 |
Opinion Papers | 1 |
Tests/Questionnaires | 1 |
Education Level
Secondary Education | 2 |
Elementary Education | 1 |
Elementary Secondary Education | 1 |
Higher Education | 1 |
Junior High Schools | 1 |
Middle Schools | 1 |
Postsecondary Education | 1 |
Audience
Researchers | 1 |
Laws, Policies, & Programs
Assessments and Surveys
National Assessment of… | 1 |
Program for International… | 1 |
Trends in International… | 1 |
What Works Clearinghouse Rating
Ozsoy, Seyma Nur; Kilmen, Sevilay – International Journal of Assessment Tools in Education, 2023
In this study, Kernel test equating methods were compared under NEAT and NEC designs. In NEAT design, Kernel post-stratification and chain equating methods taking into account optimal and large bandwidths were compared. In the NEC design, gender and/or computer/tablet use was considered as a covariate, and Kernel test equating methods were…
Descriptors: Equated Scores, Testing, Test Items, Statistical Analysis
Wyse, Adam E.; Babcock, Ben – Educational Measurement: Issues and Practice, 2020
A common belief is that the Bookmark method is a cognitively simpler standard-setting method than the modified Angoff method. However, a limited amount of research has investigated panelist's ability to perform well the Bookmark method, and whether some of the challenges panelists face with the Angoff method may also be present in the Bookmark…
Descriptors: Standard Setting (Scoring), Evaluation Methods, Testing Problems, Test Items
Yesiltas, Gonca; Paek, Insu – Educational and Psychological Measurement, 2020
A log-linear model (LLM) is a well-known statistical method to examine the relationship among categorical variables. This study investigated the performance of LLM in detecting differential item functioning (DIF) for polytomously scored items via simulations where various sample sizes, ability mean differences (impact), and DIF types were…
Descriptors: Simulation, Sample Size, Item Analysis, Scores
Park, Sung Eun; Ahn, Soyeon; Zopluoglu, Cengiz – Educational and Psychological Measurement, 2021
This study presents a new approach to synthesizing differential item functioning (DIF) effect size: First, using correlation matrices from each study, we perform a multigroup confirmatory factor analysis (MGCFA) that examines measurement invariance of a test item between two subgroups (i.e., focal and reference groups). Then we synthesize, across…
Descriptors: Item Analysis, Effect Size, Difficulty Level, Monte Carlo Methods
Gu, Lixiong; Ling, Guangming; Qu, Yanxuan – ETS Research Report Series, 2019
Research has found that the "a"-stratified item selection strategy (STR) for computerized adaptive tests (CATs) may lead to insufficient use of high a items at later stages of the tests and thus to reduced measurement precision. A refined approach, unequal item selection across strata (USTR), effectively improves test precision over the…
Descriptors: Computer Assisted Testing, Adaptive Testing, Test Use, Test Items
Socha, Alan; DeMars, Christine E.; Zilberberg, Anna; Phan, Ha – International Journal of Testing, 2015
The Mantel-Haenszel (MH) procedure is commonly used to detect items that function differentially for groups of examinees from various demographic and linguistic backgrounds--for example, in international assessments. As in some other DIF methods, the total score is used to match examinees on ability. In thin matching, each of the total score…
Descriptors: Test Items, Educational Testing, Evaluation Methods, Ability Grouping
Sachse, Karoline A.; Roppelt, Alexander; Haag, Nicole – Journal of Educational Measurement, 2016
Trend estimation in international comparative large-scale assessments relies on measurement invariance between countries. However, cross-national differential item functioning (DIF) has been repeatedly documented. We ran a simulation study using national item parameters, which required trends to be computed separately for each country, to compare…
Descriptors: Comparative Analysis, Measurement, Test Bias, Simulation
Wang, Wen-Chung; Shih, Ching-Lin; Sun, Guo-Wei – Educational and Psychological Measurement, 2012
The DIF-free-then-DIF (DFTD) strategy consists of two steps: (a) select a set of items that are the most likely to be DIF-free and (b) assess the other items for DIF (differential item functioning) using the designated items as anchors. The rank-based method together with the computer software IRTLRDIF can select a set of DIF-free polytomous items…
Descriptors: Test Bias, Test Items, Item Response Theory, Evaluation Methods
Diakow, Ronli Phyllis – ProQuest LLC, 2013
This dissertation comprises three papers that propose, discuss, and illustrate models to make improved inferences about research questions regarding student achievement in education. Addressing the types of questions common in educational research today requires three different "extensions" to traditional educational assessment: (1)…
Descriptors: Inferences, Educational Assessment, Academic Achievement, Educational Research
Robitzsch, Alexander; Rupp, Andre A. – Educational and Psychological Measurement, 2009
This article describes the results of a simulation study to investigate the impact of missing data on the detection of differential item functioning (DIF). Specifically, it investigates how four methods for dealing with missing data (listwise deletion, zero imputation, two-way imputation, response function imputation) interact with two methods of…
Descriptors: Test Bias, Simulation, Interaction, Effect Size
Sun, Koun-Tem; Chen, Yu-Jen; Tsai, Shu-Yen; Cheng, Chien-Fen – Applied Measurement in Education, 2008
In educational measurement, the construction of parallel test forms is often a combinatorial optimization problem that involves the time-consuming selection of items to construct tests having approximately the same test information functions (TIFs) and constraints. This article proposes a novel method, genetic algorithm (GA), to construct parallel…
Descriptors: Test Format, Measurement Techniques, Equations (Mathematics), Item Response Theory
Lu, Irene R. R.; Thomas, D. Roland – Structural Equation Modeling: A Multidisciplinary Journal, 2008
This article considers models involving a single structural equation with latent explanatory and/or latent dependent variables where discrete items are used to measure the latent variables. Our primary focus is the use of scores as proxies for the latent variables and carrying out ordinary least squares (OLS) regression on such scores to estimate…
Descriptors: Least Squares Statistics, Computation, Item Response Theory, Structural Equation Models
Ferrao, Maria – Assessment & Evaluation in Higher Education, 2010
The Bologna Declaration brought reforms into higher education that imply changes in teaching methods, didactic materials and textbooks, infrastructures and laboratories, etc. Statistics and mathematics are disciplines that traditionally have the worst success rates, particularly in non-mathematics core curricula courses. This research project,…
Descriptors: Foreign Countries, Computer Assisted Testing, Educational Technology, Educational Assessment
Reckase, Mark D. – Educational Measurement: Issues and Practice, 2006
Schulz (2006) provides a different perspective on standard setting than that provided in Reckase (2006). He also suggests a modification to the bookmark procedure and some alternative models for errors in panelists' judgments than those provided by Reckase. This article provides a response to some of the points made by Schulz and reports some…
Descriptors: Evaluation Methods, Standard Setting, Reader Response, Regression (Statistics)

Zwick, Rebecca; Thayer, Dorothy T. – Journal of Educational and Behavioral Statistics, 1996
Two possible standard error formulas for the polytomous differential item functioning index proposed by N. J. Dorans and A. P. Schmitt (1991) were derived. These standard errors, and associated hypothesis-testing procedures, were evaluated through simulated data. The standard error that performed better is based on N. Mantel's (1963)…
Descriptors: Error of Measurement, Evaluation Methods, Hypothesis Testing, Item Bias
Previous Page | Next Page ยป
Pages: 1 | 2