Publication Date
In 2025 | 0 |
Since 2024 | 1 |
Since 2021 (last 5 years) | 9 |
Since 2016 (last 10 years) | 120 |
Since 2006 (last 20 years) | 174 |
Descriptor
Source
Author
Publication Type
Education Level
Audience
Location
United States | 27 |
Australia | 17 |
Finland | 16 |
Germany | 15 |
Turkey | 14 |
Singapore | 13 |
Canada | 10 |
Hong Kong | 10 |
Japan | 9 |
South Korea | 9 |
China | 8 |
More ▼ |
Laws, Policies, & Programs
Individuals with Disabilities… | 1 |
Assessments and Surveys
What Works Clearinghouse Rating
Liu, Ivy; Suesse, Thomas; Harvey, Samuel; Gu, Peter Yongqi; Fernández, Daniel; Randal, John – Educational and Psychological Measurement, 2023
The Mantel-Haenszel estimator is one of the most popular techniques for measuring differential item functioning (DIF). A generalization of this estimator is applied to the context of DIF to compare items by taking the covariance of odds ratio estimators between dependent items into account. Unlike the Item Response Theory, the method does not rely…
Descriptors: Test Bias, Computation, Statistical Analysis, Achievement Tests
Xiao, Leifeng; Hau, Kit-Tai – Educational and Psychological Measurement, 2023
We examined the performance of coefficient alpha and its potential competitors (ordinal alpha, omega total, Revelle's omega total [omega RT], omega hierarchical [omega h], greatest lower bound [GLB], and coefficient "H") with continuous and discrete data having different types of non-normality. Results showed the estimation bias was…
Descriptors: Statistical Bias, Statistical Analysis, Likert Scales, Statistical Distributions
Robitzsch, Alexander; Lüdtke, Oliver – Journal of Educational and Behavioral Statistics, 2022
One of the primary goals of international large-scale assessments in education is the comparison of country means in student achievement. This article introduces a framework for discussing differential item functioning (DIF) for such mean comparisons. We compare three different linking methods: concurrent scaling based on full invariance,…
Descriptors: Test Bias, International Assessment, Scaling, Comparative Analysis
Heine, Jörg-Henrik; Robitzsch, Alexander – Large-scale Assessments in Education, 2022
Research Question: This paper examines the overarching question of to what extent different analytic choices may influence the inference about country-specific cross-sectional and trend estimates in international large-scale assessments. We take data from the assessment of PISA mathematics proficiency from the four rounds from 2003 to 2012 as a…
Descriptors: Foreign Countries, International Assessment, Achievement Tests, Secondary School Students
Zheng, Xiaying; Yang, Ji Seung – Measurement: Interdisciplinary Research and Perspectives, 2021
The purpose of this paper is to briefly introduce two most common applications of multiple group item response theory (IRT) models, namely detecting differential item functioning (DIF) analysis and nonequivalent group score linking with a simultaneous calibration. We illustrate how to conduct those analyses using the "Stata" item…
Descriptors: Item Response Theory, Test Bias, Computer Software, Statistical Analysis
Liu, Yuan; Hau, Kit-Tai – Educational and Psychological Measurement, 2020
In large-scale low-stake assessment such as the Programme for International Student Assessment (PISA), students may skip items (missingness) which are within their ability to complete. The detection and taking care of these noneffortful responses, as a measure of test-taking motivation, is an important issue in modern psychometric models.…
Descriptors: Response Style (Tests), Motivation, Test Items, Statistical Analysis
Huang, Hung-Yu – Educational and Psychological Measurement, 2020
In educational assessments and achievement tests, test developers and administrators commonly assume that test-takers attempt all test items with full effort and leave no blank responses with unplanned missing values. However, aberrant response behavior--such as performance decline, dropping out beyond a certain point, and skipping certain items…
Descriptors: Item Response Theory, Response Style (Tests), Test Items, Statistical Analysis
Karadavut, Tugba; Cohen, Allan S.; Kim, Seock-Ho – Measurement: Interdisciplinary Research and Perspectives, 2020
Mixture Rasch (MixRasch) models conventionally assume normal distributions for latent ability. Previous research has shown that the assumption of normality is often unmet in educational and psychological measurement. When normality is assumed, asymmetry in the actual latent ability distribution has been shown to result in extraction of spurious…
Descriptors: Item Response Theory, Ability, Statistical Distributions, Sample Size
Rios, Joseph A. – Educational and Psychological Measurement, 2021
Low test-taking effort as a validity threat is common when examinees perceive an assessment context to have minimal personal value. Prior research has shown that in such contexts, subgroups may differ in their effort, which raises two concerns when making subgroup mean comparisons. First, it is unclear how differential effort could influence…
Descriptors: Response Style (Tests), Statistical Analysis, Measurement, Comparative Analysis
Pedro San Martin Soares – Journal of Psychoeducational Assessment, 2024
Brazil's education system lags behind international standards, with two-fifths of students scoring below the minimum level of proficiency in mathematics, science, and reading. Thus, this study combined machine learning with traditional statistics to identify the most important predictors and to interpret their effects on proficiency in the PISA…
Descriptors: Foreign Countries, Achievement Tests, Secondary School Students, International Assessment
Hien Tran Thuy; Son Le Phuoc – Mathematics Teaching Research Journal, 2023
The goal of this research was to provide a general assessment framework of students' statistical reasoning in medicine, then build three scales to assess students' statistical reasoning ability in solving practical medical problems, including Description, Interpretation and Prediction. On that basis, the study designed a set of tools to assess…
Descriptors: Evaluation, Statistics, Statistical Analysis, Medical Education
Casabianca, Jodi M.; Lewis, Charles – Journal of Educational and Behavioral Statistics, 2018
The null hypothesis test used in differential item functioning (DIF) detection tests for a subgroup difference in item-level performance--if the null hypothesis of "no DIF" is rejected, the item is flagged for DIF. Conversely, an item is kept in the test form if there is insufficient evidence of DIF. We present frequentist and empirical…
Descriptors: Test Bias, Hypothesis Testing, Bayesian Statistics, Statistical Analysis
Wulandari, Nidya Ferry; Jailani – Journal on Mathematics Education, 2018
The aims of this research were to describe mathematics skill of 8th fifteen-year old students in Yogyakarta in solving problem of PISA. The sampling was combination of stratified and cluster random sampling. The sample consisting of 400 students was selected from fifteen schools. The data collection was by tests. The research finding revealed that…
Descriptors: Foreign Countries, Mathematics Skills, Problem Solving, Achievement Tests
Braun, Henry; von Davier, Matthias – Large-scale Assessments in Education, 2017
Background: Economists are making increasing use of measures of student achievement obtained through large-scale survey assessments such as NAEP, TIMSS, and PISA. The construction of these measures, employing plausible value (PV) methodology, is quite different from that of the more familiar test scores associated with assessments such as the SAT…
Descriptors: Scores, Test Use, Measurement, Psychometrics
Hopfenbeck, Therese N.; Lenkeit, Jenny; El Masri, Yasmine; Cantrell, Kate; Ryan, Jeanne; Baird, Jo-Anne – Scandinavian Journal of Educational Research, 2018
International large-scale assessments are on the rise, with the Programme for International Student Assessment (PISA) seen by many as having strategic prominence in education policy debates. The present article reviews PISA-related English-language peer-reviewed articles from the programme's first cycle in 2000 to its most current in 2015. Five…
Descriptors: Foreign Countries, Achievement Tests, International Assessment, Secondary School Students