NotesFAQContact Us
Collection
Advanced
Search Tips
Source
Applied Measurement in…65
Audience
Laws, Policies, & Programs
Race to the Top1
What Works Clearinghouse Rating
Showing 1 to 15 of 65 results Save | Export
Peer reviewed Peer reviewed
Direct linkDirect link
Amirhossein Rasooli; Christopher DeLuca – Applied Measurement in Education, 2024
Inspired by the recent 21st century social and educational movements toward equity, diversity, and inclusion for disadvantaged groups, educational researchers have sought in conceptualizing fairness in classroom assessment contexts. These efforts have provoked promising key theoretical foundations and empirical investigations to examine fairness…
Descriptors: Test Bias, Student Evaluation, Social Justice, Equal Education
Peer reviewed Peer reviewed
Direct linkDirect link
Chalmers, R. Philip; Zheng, Guoguo – Applied Measurement in Education, 2023
This article presents generalizations of SIBTEST and crossing-SIBTEST statistics for differential item functioning (DIF) investigations involving more than two groups. After reviewing the original two-group setup for these statistics, a set of multigroup generalizations that support contrast matrices for joint tests of DIF are presented. To…
Descriptors: Test Bias, Test Items, Item Response Theory, Error of Measurement
Peer reviewed Peer reviewed
Direct linkDirect link
Yi-Hsin Chen – Applied Measurement in Education, 2024
This study aims to apply the differential item functioning (DIF) technique with the deterministic inputs, noisy "and" gate (DINA) model to validate the mathematics construct and diagnostic attribute profiles across American and Singaporean students. Even with the same ability level, every single item is expected to show uniform DIF…
Descriptors: Foreign Countries, Achievement Tests, Elementary Secondary Education, International Assessment
Peer reviewed Peer reviewed
Direct linkDirect link
Xiao, Leifeng; Hau, Kit-Tai – Applied Measurement in Education, 2023
We compared coefficient alpha with five alternatives (omega total, omega RT, omega h, GLB, and coefficient H) in two simulation studies. Results showed for unidimensional scales, (a) all indices except omega h performed similarly well for most conditions; (b) alpha is still good; (c) GLB and coefficient H overestimated reliability with small…
Descriptors: Test Theory, Test Reliability, Factor Analysis, Test Length
Peer reviewed Peer reviewed
Direct linkDirect link
Rios, Joseph A. – Applied Measurement in Education, 2022
Testing programs are confronted with the decision of whether to report individual scores for examinees that have engaged in rapid guessing (RG). As noted by the "Standards for Educational and Psychological Testing," this decision should be based on a documented criterion that determines score exclusion. To this end, a number of heuristic…
Descriptors: Testing, Guessing (Tests), Academic Ability, Scores
Peer reviewed Peer reviewed
Direct linkDirect link
Lions, Séverin; Monsalve, Carlos; Dartnell, Pablo; Blanco, María Paz; Ortega, Gabriel; Lemarié, Julie – Applied Measurement in Education, 2022
Multiple-choice tests are widely used in education, often for high-stakes assessment purposes. Consequently, these tests should be constructed following the highest standards. Many efforts have been undertaken to advance item-writing guidelines intended to improve tests. One important issue is the unwanted effects of the options' position on test…
Descriptors: Multiple Choice Tests, High Stakes Tests, Test Construction, Guidelines
Peer reviewed Peer reviewed
Direct linkDirect link
Daniel Katz; Anne Corinne Huggins-Manley; Walter Leite – Applied Measurement in Education, 2022
According to the "Standards for Educational and Psychological Testing" (2014), one aspect of test fairness concerns examinees having comparable opportunities to learn prior to taking tests. Meanwhile, many researchers are developing platforms enhanced by artificial intelligence (AI) that can personalize curriculum to individual student…
Descriptors: High Stakes Tests, Test Bias, Testing Problems, Prior Learning
Peer reviewed Peer reviewed
Direct linkDirect link
Mehrazmay, Roghayeh; Ghonsooly, Behzad; de la Torre, Jimmy – Applied Measurement in Education, 2021
The present study aims to examine gender differential item functioning (DIF) in the reading comprehension section of a high stakes test using cognitive diagnosis models. Based on the multiple-group generalized deterministic, noisy "and" gate (MG G-DINA) model, the Wald test and likelihood ratio test are used to detect DIF. The flagged…
Descriptors: Test Bias, College Entrance Examinations, Gender Differences, Reading Tests
Peer reviewed Peer reviewed
Direct linkDirect link
Kopp, Jason P.; Jones, Andrew T. – Applied Measurement in Education, 2020
Traditional psychometric guidelines suggest that at least several hundred respondents are needed to obtain accurate parameter estimates under the Rasch model. However, recent research indicates that Rasch equating results in accurate parameter estimates with sample sizes as small as 25. Item parameter drift under the Rasch model has been…
Descriptors: Item Response Theory, Psychometrics, Sample Size, Sampling
Peer reviewed Peer reviewed
Direct linkDirect link
El Masri, Yasmine H.; Andrich, David – Applied Measurement in Education, 2020
In large-scale educational assessments, it is generally required that tests are composed of items that function invariantly across the groups to be compared. Despite efforts to ensure invariance in the item construction phase, for a range of reasons (including the security of items) it is often necessary to account for differential item…
Descriptors: Models, Goodness of Fit, Test Validity, Achievement Tests
Peer reviewed Peer reviewed
Direct linkDirect link
Sachse, Karoline A.; Haag, Nicole – Applied Measurement in Education, 2017
Standard errors computed according to the operational practices of international large-scale assessment studies such as the Programme for International Student Assessment's (PISA) or the Trends in International Mathematics and Science Study (TIMSS) may be biased when cross-national differential item functioning (DIF) and item parameter drift are…
Descriptors: Error of Measurement, Test Bias, International Assessment, Computation
Peer reviewed Peer reviewed
Direct linkDirect link
Finch, W. Holmes – Applied Measurement in Education, 2016
Differential item functioning (DIF) assessment is a crucial component in test construction, serving as the primary way in which instrument developers ensure that measures perform in the same way for multiple groups within the population. When such is not the case, scores may not accurately reflect the trait of interest for all individuals in the…
Descriptors: Test Bias, Monte Carlo Methods, Comparative Analysis, Population Groups
Peer reviewed Peer reviewed
Direct linkDirect link
Oliveri, Maria Elena; Ercikan, Kadriye; Lyons-Thomas, Juliette; Holtzman, Steven – Applied Measurement in Education, 2016
Differential item functioning (DIF) analyses have been used as the primary method in large-scale assessments to examine fairness for subgroups. Currently, DIF analyses are conducted utilizing manifest methods using observed characteristics (gender and race/ethnicity) for grouping examinees. Homogeneity of item responses is assumed denoting that…
Descriptors: Test Bias, Language Minorities, Effect Size, Foreign Countries
Peer reviewed Peer reviewed
Direct linkDirect link
Wise, Steven L.; Gao, Lingyun – Applied Measurement in Education, 2017
There has been an increased interest in the impact of unmotivated test taking on test performance and score validity. This has led to the development of new ways of measuring test-taking effort based on item response time. In particular, Response Time Effort (RTE) has been shown to provide an assessment of effort down to the level of individual…
Descriptors: Test Bias, Computer Assisted Testing, Item Response Theory, Achievement Tests
Peer reviewed Peer reviewed
Direct linkDirect link
Lee, HyeSun – Applied Measurement in Education, 2018
The current simulation study examined the effects of Item Parameter Drift (IPD) occurring in a short scale on parameter estimates in multilevel models where scores from a scale were employed as a time-varying predictor to account for outcome scores. Five factors, including three decisions about IPD, were considered for simulation conditions. It…
Descriptors: Test Items, Hierarchical Linear Modeling, Predictor Variables, Scores
Previous Page | Next Page »
Pages: 1  |  2  |  3  |  4  |  5