Publication Date
In 2025 | 0 |
Since 2024 | 0 |
Since 2021 (last 5 years) | 10 |
Since 2016 (last 10 years) | 885 |
Since 2006 (last 20 years) | 2370 |
Descriptor
Comparative Analysis | 2723 |
Statistical Analysis | 2723 |
Scores | 1235 |
Foreign Countries | 843 |
Elementary School Students | 830 |
Gender Differences | 725 |
Public Schools | 613 |
Racial Differences | 610 |
Ethnic Groups | 576 |
Reading Tests | 508 |
Pretests Posttests | 501 |
More ▼ |
Source
Author
Smolkowski, Keith | 10 |
Bianchini, John C. | 9 |
Loret, Peter G. | 9 |
Vaughn, Sharon | 8 |
Clarke, Ben | 7 |
Fien, Hank | 7 |
von Davier, Alina A. | 7 |
Cho, Sun-Joo | 6 |
Doabler, Christian T. | 6 |
Petscher, Yaacov | 6 |
Kim, Sooyeon | 5 |
More ▼ |
Publication Type
Education Level
Location
Turkey | 102 |
Iran | 77 |
Texas | 71 |
California | 60 |
Florida | 47 |
Taiwan | 45 |
Georgia | 41 |
Germany | 41 |
Pennsylvania | 37 |
Netherlands | 35 |
North Carolina | 34 |
More ▼ |
Laws, Policies, & Programs
Assessments and Surveys
What Works Clearinghouse Rating
Meets WWC Standards without Reservations | 7 |
Meets WWC Standards with or without Reservations | 18 |
Does not meet standards | 19 |
Robitzsch, Alexander; Lüdtke, Oliver – Journal of Educational and Behavioral Statistics, 2022
One of the primary goals of international large-scale assessments in education is the comparison of country means in student achievement. This article introduces a framework for discussing differential item functioning (DIF) for such mean comparisons. We compare three different linking methods: concurrent scaling based on full invariance,…
Descriptors: Test Bias, International Assessment, Scaling, Comparative Analysis
Erbeli, Florina; He, Kai; Cheek, Connor; Rice, Marianne; Qian, Xiaoning – Scientific Studies of Reading, 2023
Purpose: Researchers have developed a constellation model of decodingrelated reading disabilities (RD) to improve the RD risk determination. The model's hallmark is its inclusion of various RD indicators to determine RD risk. Classification methods such as logistic regression (LR) might be one way to determine RD risk within the constellation…
Descriptors: At Risk Students, Reading Difficulties, Classification, Comparative Analysis
Soysal, Sumeyra; Yilmaz Kogar, Esin – International Journal of Assessment Tools in Education, 2021
In this study, whether item position effects lead to DIF in the condition where different test booklets are used was investigated. To do this the methods of Lord's chi-square and Raju's unsigned area with the 3PL model under with and without item purification were used. When the performance of the methods was compared, it was revealed that…
Descriptors: Item Response Theory, Test Bias, Test Items, Comparative Analysis
Wang, Lu; Steedle, Jeffrey – ACT, Inc., 2020
In recent ACT mode comparability studies, students testing on laptop or desktop computers earned slightly higher scores on average than students who tested on paper, especially on the ACT® reading and English tests (Li et al., 2017). Equating procedures adjust for such "mode effects" to make ACT scores comparable regardless of testing…
Descriptors: Test Format, Reading Tests, Language Tests, English
Diaz, Emily; Brooks, Gordon; Johanson, George – International Journal of Assessment Tools in Education, 2021
This Monte Carlo study assessed Type I error in differential item functioning analyses using Lord's chi-square (LC), Likelihood Ratio Test (LRT), and Mantel-Haenszel (MH) procedure. Two research interests were investigated: item response theory (IRT) model specification in LC and the LRT and continuity correction in the MH procedure. This study…
Descriptors: Test Bias, Item Response Theory, Statistical Analysis, Comparative Analysis
Rios, Joseph A. – Educational and Psychological Measurement, 2021
Low test-taking effort as a validity threat is common when examinees perceive an assessment context to have minimal personal value. Prior research has shown that in such contexts, subgroups may differ in their effort, which raises two concerns when making subgroup mean comparisons. First, it is unclear how differential effort could influence…
Descriptors: Response Style (Tests), Statistical Analysis, Measurement, Comparative Analysis
Ford, Jeremy W.; Conoyer, Sarah J.; Lembke, Erica S.; Smith, R. Alex; Hosp, John L. – Assessment for Effective Intervention, 2018
In the present study, two types of curriculum-based measurement (CBM) tools in science, Vocabulary Matching (VM) and Statement Verification for Science (SV-S), a modified Sentence Verification Technique, were compared. Specifically, this study aimed to determine whether the format of information presented (i.e., SV-S vs. VM) produces differences…
Descriptors: Curriculum Based Assessment, Evaluation Methods, Measurement Techniques, Comparative Analysis
Cormier, Damien C.; Bulut, Okan; Singh, Deepak; Kennedy, Kathleen E.; Wang, Kun; Heudes, Alethea; Lekwa, Adam J. – Journal of Psychoeducational Assessment, 2018
The selection and interpretation of individually administered norm-referenced cognitive tests that are administered to culturally and linguistically diverse (CLD) students continue to be an important consideration within the psychoeducational assessment process. Understanding test directions during the assessment of cognitive abilities is…
Descriptors: Intelligence Tests, Cognitive Ability, High Stakes Tests, Children
Benton, Tom; Leech, Tony; Hughes, Sarah – Cambridge Assessment, 2020
In the context of examinations, the phrase "maintaining standards" usually refers to any activity designed to ensure that it is no easier (or harder) to achieve a given grade in one year than in another. Specifically, it tends to mean activities associated with setting examination grade boundaries. Benton et al (2020) describes a method…
Descriptors: Mathematics Tests, Equated Scores, Comparative Analysis, Difficulty Level
Lenhard, Wolfgang; Lenhard, Alexandra – Educational and Psychological Measurement, 2021
The interpretation of psychometric test results is usually based on norm scores. We compared semiparametric continuous norming (SPCN) with conventional norming methods by simulating results for test scales with different item numbers and difficulties via an item response theory approach. Subsequently, we modeled the norm scores based on random…
Descriptors: Test Norms, Scores, Regression (Statistics), Test Items
Moshinsky, Avital; Ziegler, David; Gafni, Naomi – International Journal of Testing, 2017
Many medical schools have adopted multiple mini-interviews (MMI) as an advanced selection tool. MMIs are expensive and used to test only a few dozen candidates per day, making it infeasible to develop a different test version for each test administration. Therefore, some items are reused both within and across years. This study investigated the…
Descriptors: Interviews, Medical Schools, Test Validity, Test Reliability
Lee, Sora; Bolt, Daniel M. – Journal of Educational Measurement, 2018
Both the statistical and interpretational shortcomings of the three-parameter logistic (3PL) model in accommodating guessing effects on multiple-choice items are well documented. We consider the use of a residual heteroscedasticity (RH) model as an alternative, and compare its performance to the 3PL with real test data sets and through simulation…
Descriptors: Statistical Analysis, Models, Guessing (Tests), Multiple Choice Tests
Kelleher, Leila K.; Beach, Tyson A. C.; Frost, David M.; Johnson, Andrew M.; Dickey, James P. – Measurement in Physical Education and Exercise Science, 2018
The scoring scheme for the functional movement screen implicitly assumes that the factor structure is consistent, stable, and congruent across different populations. To determine if this is the case, we compared principal components analyses of three samples: a healthy, general population (n = 100), a group of varsity athletes (n = 101), and a…
Descriptors: Factor Structure, Test Reliability, Screening Tests, Motion
Cavalli, Eddy; Colé, Pascale; Leloup, Gilles; Poracchia-George, Florence; Sprenger-Charolles, Liliane; El Ahmadi, Abdessadek – Journal of Learning Disabilities, 2018
Developmental dyslexia is a lifelong impairment affecting 5% to 10% of the population. In French-speaking countries, although a number of standardized tests for dyslexia in children are available, tools suitable to screen for dyslexia in adults are lacking. In this study, we administered the "Alouette" reading test to a normative sample…
Descriptors: Foreign Countries, Screening Tests, Disability Identification, Dyslexia
Debeer, Dries; Ali, Usama S.; van Rijn, Peter W. – Journal of Educational Measurement, 2017
Test assembly is the process of selecting items from an item pool to form one or more new test forms. Often new test forms are constructed to be parallel with an existing (or an ideal) test. Within the context of item response theory, the test information function (TIF) or the test characteristic curve (TCC) are commonly used as statistical…
Descriptors: Test Format, Test Construction, Statistical Analysis, Comparative Analysis