Publication Date
In 2025 | 0 |
Since 2024 | 2 |
Since 2021 (last 5 years) | 16 |
Since 2016 (last 10 years) | 40 |
Since 2006 (last 20 years) | 86 |
Descriptor
Scores | 117 |
Test Bias | 117 |
Test Items | 117 |
Item Response Theory | 31 |
Comparative Analysis | 28 |
Test Reliability | 27 |
Statistical Analysis | 25 |
Test Validity | 25 |
Item Analysis | 22 |
Difficulty Level | 21 |
Foreign Countries | 21 |
More ▼ |
Source
Author
Dorans, Neil J. | 3 |
Liu, Ou Lydia | 3 |
Ackerman, Terry | 2 |
Camilli, Gregory | 2 |
Gorney, Kylie | 2 |
Lee, Won-Chan | 2 |
Lee, Yi-Hsuan | 2 |
Magis, David | 2 |
Prowker, Adam | 2 |
Robin, Frederic | 2 |
Sinharay, Sandip | 2 |
More ▼ |
Publication Type
Education Level
Secondary Education | 20 |
Higher Education | 19 |
Postsecondary Education | 13 |
Elementary Education | 12 |
High Schools | 12 |
Grade 8 | 8 |
Middle Schools | 8 |
Elementary Secondary Education | 7 |
Grade 4 | 6 |
Grade 7 | 6 |
Junior High Schools | 6 |
More ▼ |
Audience
Researchers | 3 |
Community | 1 |
Parents | 1 |
Location
Canada | 4 |
Iran | 4 |
California | 2 |
Florida | 2 |
Indonesia | 2 |
New Mexico | 2 |
Pennsylvania | 2 |
Spain | 2 |
Turkey | 2 |
Alabama | 1 |
Belgium | 1 |
More ▼ |
Laws, Policies, & Programs
Assessments and Surveys
What Works Clearinghouse Rating
Meets WWC Standards without Reservations | 1 |
Meets WWC Standards with or without Reservations | 1 |
Andrew D. Ho – Journal of Educational and Behavioral Statistics, 2024
I review opportunities and threats that widely accessible Artificial Intelligence (AI)-powered services present for educational statistics and measurement. Algorithmic and computational advances continue to improve approaches to item generation, scale maintenance, test security, test scoring, and score reporting. Predictable misuses of AI for…
Descriptors: Artificial Intelligence, Measurement, Educational Assessment, Technology Uses in Education
Paula Elosua – Language Assessment Quarterly, 2024
In sociolinguistic contexts where standardized languages coexist with regional dialects, the study of differential item functioning is a valuable tool for examining certain linguistic uses or varieties as threats to score validity. From an ecological perspective, this paper describes three stages in the study of differential item functioning…
Descriptors: Reading Tests, Reading Comprehension, Scores, Test Validity
Gorney, Kylie; Wollack, James A.; Sinharay, Sandip; Eckerly, Carol – Journal of Educational and Behavioral Statistics, 2023
Any time examinees have had access to items and/or answers prior to taking a test, the fairness of the test and validity of test score interpretations are threatened. Therefore, there is a high demand for procedures to detect both compromised items (CI) and examinees with preknowledge (EWP). In this article, we develop a procedure that uses item…
Descriptors: Scores, Test Validity, Test Items, Prior Learning
Gorney, Kylie – ProQuest LLC, 2023
Aberrant behavior refers to any type of unusual behavior that would not be expected under normal circumstances. In educational and psychological testing, such behaviors have the potential to severely bias the aberrant examinee's test score while also jeopardizing the test scores of countless others. It is therefore crucial that aberrant examinees…
Descriptors: Behavior Problems, Educational Testing, Psychological Testing, Test Bias
Gu, Zhengguo; Emons, Wilco H. M.; Sijtsma, Klaas – Journal of Educational and Behavioral Statistics, 2021
Clinical, medical, and health psychologists use difference scores obtained from pretest--posttest designs employing the same test to assess intraindividual change possibly caused by an intervention addressing, for example, anxiety, depression, eating disorder, or addiction. Reliability of difference scores is important for interpreting observed…
Descriptors: Test Reliability, Scores, Pretests Posttests, Computation
Brennan, Robert L.; Kim, Stella Y.; Lee, Won-Chan – Educational and Psychological Measurement, 2022
This article extends multivariate generalizability theory (MGT) to tests with different random-effects designs for each level of a fixed facet. There are numerous situations in which the design of a test and the resulting data structure are not definable by a single design. One example is mixed-format tests that are composed of multiple-choice and…
Descriptors: Multivariate Analysis, Generalizability Theory, Multiple Choice Tests, Test Construction
Joo, Sean; Ali, Usama; Robin, Frederic; Shin, Hyo Jeong – Large-scale Assessments in Education, 2022
We investigated the potential impact of differential item functioning (DIF) on group-level mean and standard deviation estimates using empirical and simulated data in the context of large-scale assessment. For the empirical investigation, PISA 2018 cognitive domains (Reading, Mathematics, and Science) data were analyzed using Jackknife sampling to…
Descriptors: Test Items, Item Response Theory, Scores, Student Evaluation
Rodriguez, Rebekah M.; Silvia, Paul J.; Kaufman, James C.; Reiter-Palmon, Roni; Puryear, Jeb S. – Creativity Research Journal, 2023
The original 90-item Creative Behavior Inventory (CBI) was a landmark self-report scale in creativity research, and the 28-item brief form developed nearly 20 years ago continues to be a popular measure of everyday creativity. Relatively little is known, however, about the psychometric properties of this widely used scale. In the current research,…
Descriptors: Creativity Tests, Creativity, Creative Thinking, Psychometrics
Tim Jacobbe; Bob delMas; Brad Hartlaub; Jeff Haberstroh; Catherine Case; Steven Foti; Douglas Whitaker – Numeracy, 2023
The development of assessments as part of the funded LOCUS project is described. The assessments measure students' conceptual understanding of statistics as outlined in the GAISE PreK-12 Framework. Results are reported from a large-scale administration to 3,430 students in grades 6 through 12 in the United States. Items were designed to assess…
Descriptors: Statistics Education, Common Core State Standards, Student Evaluation, Elementary School Students
Ip, Edward H.; Strachan, Tyler; Fu, Yanyan; Lay, Alexandra; Willse, John T.; Chen, Shyh-Huei; Rutkowski, Leslie; Ackerman, Terry – Journal of Educational Measurement, 2019
Test items must often be broad in scope to be ecologically valid. It is therefore almost inevitable that secondary dimensions are introduced into a test during test development. A cognitive test may require one or more abilities besides the primary ability to correctly respond to an item, in which case a unidimensional test score overestimates the…
Descriptors: Test Items, Test Bias, Test Construction, Scores
Kuang, Huan; Sahin, Fusun – Large-scale Assessments in Education, 2023
Background: Examinees may not make enough effort when responding to test items if the assessment has no consequence for them. These disengaged responses can be problematic in low-stakes, large-scale assessments because they can bias item parameter estimates. However, the amount of bias, and whether this bias is similar across administrations, is…
Descriptors: Test Items, Comparative Analysis, Mathematics Tests, Reaction Time
Alexander James Kwako – ProQuest LLC, 2023
Automated assessment using Natural Language Processing (NLP) has the potential to make English speaking assessments more reliable, authentic, and accessible. Yet without careful examination, NLP may exacerbate social prejudices based on gender or native language (L1). Current NLP-based assessments are prone to such biases, yet research and…
Descriptors: Gender Bias, Natural Language Processing, Native Language, Computational Linguistics
Kopp, Jason P.; Jones, Andrew T. – Applied Measurement in Education, 2020
Traditional psychometric guidelines suggest that at least several hundred respondents are needed to obtain accurate parameter estimates under the Rasch model. However, recent research indicates that Rasch equating results in accurate parameter estimates with sample sizes as small as 25. Item parameter drift under the Rasch model has been…
Descriptors: Item Response Theory, Psychometrics, Sample Size, Sampling
Ames, Allison J. – Educational and Psychological Measurement, 2022
Individual response style behaviors, unrelated to the latent trait of interest, may influence responses to ordinal survey items. Response style can introduce bias in the total score with respect to the trait of interest, threatening valid interpretation of scores. Despite claims of response style stability across scales, there has been little…
Descriptors: Response Style (Tests), Individual Differences, Scores, Test Items
Lambert, Matthew C.; Martin, Jodie; Epstein, Michael H.; Cullinan, Douglas; Katsiyannis, Antonis – Psychology in the Schools, 2021
The present study investigated the psychometric properties of the "Scales for Assessing Emotional Disturbance -- Third Edition: Rating Scale" (SAED-3 RS), which is designed for use in identifying students with emotional disturbance for special education services. The purposes of this study were to evaluate (a) the measurement invariance…
Descriptors: Disability Identification, Rating Scales, Test Items, Item Response Theory