Publication Date
In 2025 | 9 |
Since 2024 | 22 |
Since 2021 (last 5 years) | 112 |
Since 2016 (last 10 years) | 3914 |
Since 2006 (last 20 years) | 8111 |
Descriptor
Statistical Analysis | 10287 |
Foreign Countries | 3633 |
Scores | 2901 |
Comparative Analysis | 2723 |
Correlation | 2073 |
Elementary School Students | 1644 |
Pretests Posttests | 1455 |
Academic Achievement | 1408 |
Questionnaires | 1404 |
Achievement Tests | 1386 |
Gender Differences | 1377 |
More ▼ |
Source
Author
Sinharay, Sandip | 29 |
Smolkowski, Keith | 21 |
Dorans, Neil J. | 19 |
Tindal, Gerald | 19 |
Alonzo, Julie | 17 |
Fien, Hank | 16 |
Livingston, Samuel A. | 16 |
Raykov, Tenko | 16 |
Clarke, Ben | 14 |
Petscher, Yaacov | 14 |
Baker, Scott K. | 12 |
More ▼ |
Publication Type
Education Level
Audience
Researchers | 109 |
Practitioners | 49 |
Teachers | 34 |
Policymakers | 15 |
Administrators | 12 |
Students | 9 |
Counselors | 3 |
Parents | 1 |
Location
Turkey | 483 |
Iran | 279 |
Texas | 200 |
California | 182 |
Germany | 168 |
Australia | 163 |
Canada | 140 |
Taiwan | 135 |
China | 134 |
Florida | 133 |
Netherlands | 119 |
More ▼ |
Laws, Policies, & Programs
Assessments and Surveys
What Works Clearinghouse Rating
Meets WWC Standards without Reservations | 17 |
Meets WWC Standards with or without Reservations | 35 |
Does not meet standards | 39 |
Haeju Lee; Kyung Yong Kim – Journal of Educational Measurement, 2025
When no prior information of differential item functioning (DIF) exists for items in a test, either the rank-based or iterative purification procedure might be preferred. The rank-based purification selects anchor items based on a preliminary DIF test. For a preliminary DIF test, likelihood ratio test (LRT) based approaches (e.g.,…
Descriptors: Test Items, Equated Scores, Test Bias, Accuracy
Tom Benton – Practical Assessment, Research & Evaluation, 2025
This paper proposes an extension of linear equating that may be useful in one of two fairly common assessment scenarios. One is where different students have taken different combinations of test forms. This might occur, for example, where students have some free choice over the exam papers they take within a particular qualification. In this…
Descriptors: Equated Scores, Test Format, Test Items, Computation
Gabrielle Francis; Nathaniel von der Embse; David Putwain; Eunsook Kim – Journal of Psychoeducational Assessment, 2025
Standardized testing is an integral part of the English and American education systems. However, the use of high-stakes testing has unintended consequences, one of which is test anxiety. Over the last 50 years, increased attention has been directed to developing tools to identify students experiencing test anxiety. However, many test anxiety…
Descriptors: Test Anxiety, Secondary School Students, Foreign Countries, Affective Measures
Inga Laukaityte; Marie Wiberg – Practical Assessment, Research & Evaluation, 2024
The overall aim was to examine effects of differences in group ability and features of the anchor test form on equating bias and the standard error of equating (SEE) using both real and simulated data. Chained kernel equating, Postratification kernel equating, and Circle-arc equating were studied. A college admissions test with four different…
Descriptors: Ability Grouping, Test Items, College Entrance Examinations, High Stakes Tests
Jiajing Huang – ProQuest LLC, 2022
The nonequivalent-groups anchor-test (NEAT) data-collection design is commonly used in large-scale assessments. Under this design, different test groups take different test forms. Each test form has its own unique items and all test forms share a set of common items. If item response theory (IRT) models are applied to analyze the test data, the…
Descriptors: Item Response Theory, Test Format, Test Items, Test Construction
He, Qingping; Meadows, Michelle; Black, Beth – Research Papers in Education, 2022
A potential negative consequence of high-stakes testing is inappropriate test behaviour involving individuals and/or institutions. Inappropriate test behaviour and test collusion can result in aberrant response patterns and anomalous test scores and invalidate the intended interpretation and use of test results. A variety of statistical techniques…
Descriptors: Statistical Analysis, High Stakes Tests, Scores, Response Style (Tests)
Merchant, Stefan; Rich, Jessica; Klinger, Don A. – Canadian Journal of Educational Administration and Policy, 2022
Both school and district administrators use the results of standardized, large-scale tests to inform decisions about the need for, or success of, educational programs and interventions. However, test results at the school level are subject to random fluctuations due to changes in cohort, test items, and other factors outside of the school's…
Descriptors: Standardized Tests, Foreign Countries, Generalizability Theory, Scores
Su, Kun; Henson, Robert A. – Journal of Educational and Behavioral Statistics, 2023
This article provides a process to carefully evaluate the suitability of a content domain for which diagnostic classification models (DCMs) could be applicable and then optimized steps for constructing a test blueprint for applying DCMs and a real-life example illustrating this process. The content domains were carefully evaluated using a set of…
Descriptors: Classification, Models, Science Tests, Physics
El Alaoui, Mohamed – IEEE Transactions on Learning Technologies, 2023
Classical evaluation methods, assessments, exams, and so forth accentuate the perception of one against all, professor versus learners. Including students in the assessment process, allows transforming the professor from an opponent to a critical friend, with the role of helping students to recognize both their strengths and weaknesses. However,…
Descriptors: Peer Evaluation, Educational Improvement, Test Validity, Test Reliability
Liu, Ivy; Suesse, Thomas; Harvey, Samuel; Gu, Peter Yongqi; Fernández, Daniel; Randal, John – Educational and Psychological Measurement, 2023
The Mantel-Haenszel estimator is one of the most popular techniques for measuring differential item functioning (DIF). A generalization of this estimator is applied to the context of DIF to compare items by taking the covariance of odds ratio estimators between dependent items into account. Unlike the Item Response Theory, the method does not rely…
Descriptors: Test Bias, Computation, Statistical Analysis, Achievement Tests
Weese, James D.; Turner, Ronna C.; Ames, Allison; Crawford, Brandon; Liang, Xinya – Educational and Psychological Measurement, 2022
A simulation study was conducted to investigate the heuristics of the SIBTEST procedure and how it compares with ETS classification guidelines used with the Mantel-Haenszel procedure. Prior heuristics have been used for nearly 25 years, but they are based on a simulation study that was restricted due to computer limitations and that modeled item…
Descriptors: Test Bias, Heuristics, Classification, Statistical Analysis
Onur Demirkaya; Sharon Frey; Sid Sharairi; JongPil Kim – International Electronic Journal of Elementary Education, 2025
This study compares latent profiles derived from student subgroups of varying levels of mathematical skills defined by achievement and ability assessment scores. Achievement and ability cut scores for identifying students at both ends of the mathematics spectrum were applied and the resulting latent profiles within each condition were compared.…
Descriptors: Profiles, Statistical Analysis, Academic Achievement, Mathematics Achievement
Xiao, Leifeng; Hau, Kit-Tai – Educational and Psychological Measurement, 2023
We examined the performance of coefficient alpha and its potential competitors (ordinal alpha, omega total, Revelle's omega total [omega RT], omega hierarchical [omega h], greatest lower bound [GLB], and coefficient "H") with continuous and discrete data having different types of non-normality. Results showed the estimation bias was…
Descriptors: Statistical Bias, Statistical Analysis, Likert Scales, Statistical Distributions
Wheeler, Jordan M.; Engelhard, George; Wang, Jue – Measurement: Interdisciplinary Research and Perspectives, 2022
Objectively scoring constructed-response items on educational assessments has long been a challenge due to the use of human raters. Even well-trained raters using a rubric can inaccurately assess essays. Unfolding models measure rater's scoring accuracy by capturing the discrepancy between criterion and operational ratings by placing essays on an…
Descriptors: Accuracy, Scoring, Statistical Analysis, Models
Razavipour, Kioumars; Raji, Behnaz – Language Testing in Asia, 2022
The credibility of conclusions arrived at in quantitative research depends, to a large extent, on the quality of data collection instruments used to quantify language and non-language constructs. Despite this, research into data collection instruments used in Applied Linguistics and particularly in the thesis genre remains limited. This study…
Descriptors: Applied Linguistics, Test Reliability, Language Tests, Credibility