Publication Date
| In 2026 | 0 |
| Since 2025 | 8 |
| Since 2022 (last 5 years) | 38 |
| Since 2017 (last 10 years) | 102 |
| Since 2007 (last 20 years) | 910 |
Descriptor
Source
Author
| Thurlow, Martha | 22 |
| Popham, W. James | 17 |
| Baker, Eva L. | 14 |
| Shipman, Virginia C. | 13 |
| Sinharay, Sandip | 13 |
| Ebel, Robert L. | 12 |
| Haney, Walt | 11 |
| Herman, Joan L. | 10 |
| Mislevy, Robert J. | 10 |
| Hartley, Nancy K. | 8 |
| Koretz, Daniel | 8 |
| More ▼ | |
Publication Type
Education Level
Audience
| Practitioners | 291 |
| Teachers | 138 |
| Researchers | 79 |
| Administrators | 78 |
| Policymakers | 67 |
| Students | 20 |
| Parents | 19 |
| Counselors | 9 |
| Community | 6 |
| Media Staff | 1 |
| Support Staff | 1 |
| More ▼ | |
Location
| California | 102 |
| Canada | 82 |
| Florida | 54 |
| Australia | 52 |
| United Kingdom | 51 |
| United Kingdom (England) | 50 |
| United States | 49 |
| New York | 47 |
| Texas | 42 |
| United Kingdom (Great Britain) | 28 |
| New Jersey | 27 |
| More ▼ | |
Laws, Policies, & Programs
Assessments and Surveys
What Works Clearinghouse Rating
| Meets WWC Standards without Reservations | 1 |
| Meets WWC Standards with or without Reservations | 2 |
| Does not meet standards | 1 |
Phelps, Richard P. – Online Submission, 2019
If it is not possible for one to critique other research and succeed--or even remain securely employed--in a research profession, how is the profession ever to rid itself of flawed, biased, or fraudulent research? Answer: it will not. Any community that disallows accusations of bad behavior condones bad behavior. Any community that disallows…
Descriptors: Educational Research, Deception, Ethics, Information Dissemination
Mari Quanbeck; Andrew R. Hinkle; Sheryl S. Lazarus; Virginia A. Ressa; Martha M. Thurlow – National Center on Educational Outcomes, 2023
This report contains the proceedings of a forum held on June 28, 2023 in New Orleans, Louisiana, to discuss issues surrounding meaningful accessibility of assessments. The forum was a post-session to the Council of Chief State School Officers (CCSSO) National Conference on Student Assessment (NCSA) and was a collaboration of the "Assessment,…
Descriptors: Accessibility (for Disabled), Educational Testing, Technology Integration, Barriers
Hong, Seong Eun; Monroe, Scott; Falk, Carl F. – Journal of Educational Measurement, 2020
In educational and psychological measurement, a person-fit statistic (PFS) is designed to identify aberrant response patterns. For parametric PFSs, valid inference depends on several assumptions, one of which is that the item response theory (IRT) model is correctly specified. Previous studies have used empirical data sets to explore the effects…
Descriptors: Educational Testing, Psychological Testing, Goodness of Fit, Error of Measurement
Metsämuuronen, Jari – International Journal of Educational Methodology, 2020
Kelley's Discrimination Index (DI) is a simple and robust, classical non-parametric short-cut to estimate the item discrimination power (IDP) in the practical educational settings. Unlike item-total correlation, DI can reach the ultimate values of +1 and -1, and it is stable against the outliers. Because of the computational easiness, DI is…
Descriptors: Test Items, Computation, Item Analysis, Nonparametric Statistics
Nisbet, Isabel; Shaw, Stuart D. – Assessment in Education: Principles, Policy & Practice, 2019
Fairness in assessment is seen as increasingly important but there is a need for greater clarity in use of the term 'fair'. Also, fairness is perceived through a range of 'lenses' reflecting different traditions of thought. The lens used determines how fairness is seen and described. This article distinguishes different uses of 'fair' which have…
Descriptors: Test Bias, Measurement, Theories, Educational Assessment
Sinharay, Sandip – Grantee Submission, 2019
Benefiting from item preknowledge (e.g., McLeod, Lewis, & Thissen, 2003) is a major type of fraudulent behavior during educational assessments. This paper suggests a new statistic that can be used for detecting the examinees who may have benefitted from item preknowledge using their response times. The statistic quantifies the difference in…
Descriptors: Test Items, Cheating, Reaction Time, Identification
Davis-Berg, Elizabeth C.; Minbiole, Julie – School Science Review, 2020
The completion rates were compared for long-form questions where a large blank answer space is provided and for long-form questions where the answer space has bullet-points prompts corresponding to the parts of the question. It was found that students were more likely to complete a question when bullet points were provided in the answer space.…
Descriptors: Test Format, Test Construction, Academic Achievement, Educational Testing
Xiao, Jiaying; Bulut, Okan – Educational and Psychological Measurement, 2020
Large amounts of missing data could distort item parameter estimation and lead to biased ability estimates in educational assessments. Therefore, missing responses should be handled properly before estimating any parameters. In this study, two Monte Carlo simulation studies were conducted to compare the performance of four methods in handling…
Descriptors: Data, Computation, Ability, Maximum Likelihood Statistics
van Groen, Maaike M.; Eggen, Theo J. H. M. – Journal of Applied Testing Technology, 2020
When developing a digital test, one of the first decisions that need to be made is which type of Computer-Based Test (CBT) to develop. Six different CBT types are considered here: linear tests, automatically generated tests, computerized adaptive tests, adaptive learning environments, educational simulations, and educational games. The selection…
Descriptors: Computer Assisted Testing, Formative Evaluation, Summative Evaluation, Adaptive Testing
Raykov, Tenko; Marcoulides, George A.; Huber, Chuck – Measurement: Interdisciplinary Research and Perspectives, 2020
It is demonstrated that the popular three-parameter logistic model can lead to markedly inaccurate individual ability level estimates for mixture populations. A theoretically and empirically important setting is initially considered where (a) in one of two subpopulations (latent classes) the two-parameter logistic model holds for each item in a…
Descriptors: Item Response Theory, Models, Measurement Techniques, Item Analysis
Guo, Hongwen; Dorans, Neil J. – ETS Research Report Series, 2019
The Mantel-Haenszel delta difference (MH D-DIF) and the standardized proportion difference (STD P-DIF) are two observed-score methods that have been used to assess differential item functioning (DIF) at Educational Testing Service since the early 1990s. Latentvariable approaches to assessing measurement invariance at the item level have been…
Descriptors: Test Bias, Educational Testing, Statistical Analysis, Item Response Theory
Sinharay, Sandip; van Rijn, Peter W. – Journal of Educational and Behavioral Statistics, 2020
Response time models (RTMs) are of increasing interest in educational and psychological testing. This article focuses on the lognormal model for response times, which is one of the most popular RTMs. Several existing statistics for testing normality and the fit of factor analysis models are repurposed for testing the fit of the lognormal model. A…
Descriptors: Educational Testing, Psychological Testing, Goodness of Fit, Factor Analysis
Liu, Yang; Wang, Xiaojing – Journal of Educational and Behavioral Statistics, 2020
Parametric methods, such as autoregressive models or latent growth modeling, are usually inflexible to model the dependence and nonlinear effects among the changes of latent traits whenever the time gap is irregular and the recorded time points are individually varying. Often in practice, the growth trend of latent traits is subject to certain…
Descriptors: Bayesian Statistics, Nonparametric Statistics, Regression (Statistics), Item Response Theory
Sinharay, Sandip; van Rijn, Peter – Grantee Submission, 2020
Response-time models are of increasing interest in educational and psychological testing. This paper focuses on the lognormal model for response times (van der Linden, 2006), which is one of the most popular response-time models. Several existing statistics for testing normality and the fit of factor-analysis models are repurposed for testing the…
Descriptors: Educational Testing, Psychological Testing, Goodness of Fit, Factor Analysis
Care, Esther; Kim, Helyn – Center for Universal Education at The Brookings Institution, 2020
This framework marks the first in a series of five reports detailing the work of the Optimizing Assessment for All (OAA) project at Brookings to strengthen education systems' capacity to integrate 21st century skills (21CS) into teaching and learning, using assessment as a lever for changing classroom practices. In a world of rapid advancement and…
Descriptors: 21st Century Skills, Foreign Countries, Educational Testing, Assessment Literacy

Peer reviewed
Direct link
