Publication Date
| In 2026 | 0 |
| Since 2025 | 17 |
| Since 2022 (last 5 years) | 88 |
| Since 2017 (last 10 years) | 2668 |
| Since 2007 (last 20 years) | 8026 |
Descriptor
| Statistical Analysis | 10296 |
| Foreign Countries | 3636 |
| Scores | 2901 |
| Comparative Analysis | 2723 |
| Correlation | 2073 |
| Elementary School Students | 1645 |
| Pretests Posttests | 1455 |
| Academic Achievement | 1408 |
| Questionnaires | 1404 |
| Achievement Tests | 1386 |
| Gender Differences | 1377 |
| More ▼ | |
Source
Author
| Sinharay, Sandip | 29 |
| Smolkowski, Keith | 21 |
| Dorans, Neil J. | 19 |
| Tindal, Gerald | 19 |
| Alonzo, Julie | 17 |
| Fien, Hank | 16 |
| Livingston, Samuel A. | 16 |
| Raykov, Tenko | 16 |
| Clarke, Ben | 14 |
| Petscher, Yaacov | 14 |
| Baker, Scott K. | 12 |
| More ▼ | |
Publication Type
Education Level
Audience
| Researchers | 110 |
| Practitioners | 49 |
| Teachers | 34 |
| Policymakers | 15 |
| Administrators | 12 |
| Students | 9 |
| Counselors | 3 |
| Parents | 1 |
Location
| Turkey | 484 |
| Iran | 280 |
| Texas | 201 |
| California | 182 |
| Germany | 168 |
| Australia | 163 |
| Canada | 141 |
| China | 135 |
| Taiwan | 135 |
| Florida | 133 |
| Netherlands | 119 |
| More ▼ | |
Laws, Policies, & Programs
Assessments and Surveys
What Works Clearinghouse Rating
| Meets WWC Standards without Reservations | 17 |
| Meets WWC Standards with or without Reservations | 35 |
| Does not meet standards | 39 |
Christian Berggren; Bengt Gerdin; Solmaz Filiz Karabag – Journal of Academic Ethics, 2025
The exposure of scientific scandals and the increase of dubious research practices have generated a stream of studies on Questionable Research Practices (QRPs), such as failure to acknowledge co-authors, selective presentation of findings, or removal of data not supporting desired outcomes. In contrast to high-profile fraud cases, QRPs can be…
Descriptors: Test Construction, Test Bias, Test Format, Response Style (Tests)
Haeju Lee; Kyung Yong Kim – Journal of Educational Measurement, 2025
When no prior information of differential item functioning (DIF) exists for items in a test, either the rank-based or iterative purification procedure might be preferred. The rank-based purification selects anchor items based on a preliminary DIF test. For a preliminary DIF test, likelihood ratio test (LRT) based approaches (e.g.,…
Descriptors: Test Items, Equated Scores, Test Bias, Accuracy
Tom Benton – Practical Assessment, Research & Evaluation, 2025
This paper proposes an extension of linear equating that may be useful in one of two fairly common assessment scenarios. One is where different students have taken different combinations of test forms. This might occur, for example, where students have some free choice over the exam papers they take within a particular qualification. In this…
Descriptors: Equated Scores, Test Format, Test Items, Computation
Sun-Joo Cho; Goodwin Amanda; Jorge Salas; Sophia Mueller – Grantee Submission, 2025
This study incorporates a random forest (RF) approach to probe complex interactions and nonlinearity among predictors into an item response model with the goal of using a hybrid approach to outperform either an RF or explanatory item response model (EIRM) only in explaining item responses. In the specified model, called EIRM-RF, predicted values…
Descriptors: Item Response Theory, Artificial Intelligence, Statistical Analysis, Predictor Variables
Gabrielle Francis; Nathaniel von der Embse; David Putwain; Eunsook Kim – Journal of Psychoeducational Assessment, 2025
Standardized testing is an integral part of the English and American education systems. However, the use of high-stakes testing has unintended consequences, one of which is test anxiety. Over the last 50 years, increased attention has been directed to developing tools to identify students experiencing test anxiety. However, many test anxiety…
Descriptors: Test Anxiety, Secondary School Students, Foreign Countries, Affective Measures
Usani Joseph Ofem; Valentine Joseph Owan; Cletus Ibout; Sylvai Victor Ovat – Pedagogical Research, 2025
This study employed repeated measures ANOVA to assess the reliability of an instrument designed to measure utilization, awareness, and perception of AI in research among 150 undergraduate students. Validated instruments with robust psychometric properties were used for the study. Data collection occurred in three phases spaced two weeks apart,…
Descriptors: Statistical Analysis, Test Reliability, Undergraduate Students, Attitude Measures
Inga Laukaityte; Marie Wiberg – Practical Assessment, Research & Evaluation, 2024
The overall aim was to examine effects of differences in group ability and features of the anchor test form on equating bias and the standard error of equating (SEE) using both real and simulated data. Chained kernel equating, Postratification kernel equating, and Circle-arc equating were studied. A college admissions test with four different…
Descriptors: Ability Grouping, Test Items, College Entrance Examinations, High Stakes Tests
Jiajing Huang – ProQuest LLC, 2022
The nonequivalent-groups anchor-test (NEAT) data-collection design is commonly used in large-scale assessments. Under this design, different test groups take different test forms. Each test form has its own unique items and all test forms share a set of common items. If item response theory (IRT) models are applied to analyze the test data, the…
Descriptors: Item Response Theory, Test Format, Test Items, Test Construction
Huiming Ding; Matt Homer – Advances in Health Sciences Education, 2025
Summative assessments are often underused for feedback, despite them being rich with data of students' applied knowledge and clinical and professional skills. To better inform teaching and student support, this study aims to gain insights from summative assessments through profiling students' performance patterns and identify those students…
Descriptors: Summative Evaluation, Profiles, Statistical Analysis, Outcomes of Education
He, Qingping; Meadows, Michelle; Black, Beth – Research Papers in Education, 2022
A potential negative consequence of high-stakes testing is inappropriate test behaviour involving individuals and/or institutions. Inappropriate test behaviour and test collusion can result in aberrant response patterns and anomalous test scores and invalidate the intended interpretation and use of test results. A variety of statistical techniques…
Descriptors: Statistical Analysis, High Stakes Tests, Scores, Response Style (Tests)
Merchant, Stefan; Rich, Jessica; Klinger, Don A. – Canadian Journal of Educational Administration and Policy, 2022
Both school and district administrators use the results of standardized, large-scale tests to inform decisions about the need for, or success of, educational programs and interventions. However, test results at the school level are subject to random fluctuations due to changes in cohort, test items, and other factors outside of the school's…
Descriptors: Standardized Tests, Foreign Countries, Generalizability Theory, Scores
Su, Kun; Henson, Robert A. – Journal of Educational and Behavioral Statistics, 2023
This article provides a process to carefully evaluate the suitability of a content domain for which diagnostic classification models (DCMs) could be applicable and then optimized steps for constructing a test blueprint for applying DCMs and a real-life example illustrating this process. The content domains were carefully evaluated using a set of…
Descriptors: Classification, Models, Science Tests, Physics
Wendy Castillo; Rachel Renbarger; Sasha Mejia-Bradford; Christen Priddie; Juan Cruz; Brein Mosely; Katherine Aragon – Annenberg Institute for School Reform at Brown University, 2025
Education policy research aimed at eliminating racism necessitates methodological innovation that fosters both equity-centered approaches and robust empirical analysis of the systemic nature of racism. Most quantitative research in educational psychology omits the racist environment that students in K-12 education exist in (DeCuir-Gunby &…
Descriptors: Racism, Elementary Secondary Education, Racial Discrimination, Surveys
El Alaoui, Mohamed – IEEE Transactions on Learning Technologies, 2023
Classical evaluation methods, assessments, exams, and so forth accentuate the perception of one against all, professor versus learners. Including students in the assessment process, allows transforming the professor from an opponent to a critical friend, with the role of helping students to recognize both their strengths and weaknesses. However,…
Descriptors: Peer Evaluation, Educational Improvement, Test Validity, Test Reliability
Liu, Ivy; Suesse, Thomas; Harvey, Samuel; Gu, Peter Yongqi; Fernández, Daniel; Randal, John – Educational and Psychological Measurement, 2023
The Mantel-Haenszel estimator is one of the most popular techniques for measuring differential item functioning (DIF). A generalization of this estimator is applied to the context of DIF to compare items by taking the covariance of odds ratio estimators between dependent items into account. Unlike the Item Response Theory, the method does not rely…
Descriptors: Test Bias, Computation, Statistical Analysis, Achievement Tests

Peer reviewed
Direct link
