Publication Date
In 2025 | 0 |
Since 2024 | 1 |
Since 2021 (last 5 years) | 2 |
Since 2016 (last 10 years) | 4 |
Since 2006 (last 20 years) | 11 |
Descriptor
Error Patterns | 11 |
Test Items | 7 |
Scores | 6 |
Evaluation Methods | 3 |
Item Analysis | 3 |
Item Response Theory | 3 |
Multiple Choice Tests | 3 |
Computation | 2 |
Correlation | 2 |
Grade 5 | 2 |
Graduate Students | 2 |
More ▼ |
Source
Applied Measurement in… | 11 |
Author
D'Agostino, Jerome V. | 2 |
Wells, Craig S. | 2 |
Abu-Ghazalah, Rashid M. | 1 |
Barbara Schneider | 1 |
Bolt, Daniel M. | 1 |
Bonner, Sarah M. | 1 |
Cimetta, Adriana D. | 1 |
Dubins, David N. | 1 |
Falco, Lia D. | 1 |
I-Chien Chen | 1 |
Joseph Krajcik | 1 |
More ▼ |
Publication Type
Journal Articles | 11 |
Reports - Research | 9 |
Reports - Evaluative | 2 |
Tests/Questionnaires | 1 |
Education Level
Elementary Education | 2 |
Grade 5 | 2 |
High Schools | 2 |
Higher Education | 2 |
Intermediate Grades | 2 |
Middle Schools | 2 |
Early Childhood Education | 1 |
Grade 10 | 1 |
Grade 11 | 1 |
Grade 12 | 1 |
Grade 3 | 1 |
More ▼ |
Audience
Laws, Policies, & Programs
Assessments and Surveys
National Assessment of… | 1 |
What Works Clearinghouse Rating
Yiling Cheng; I-Chien Chen; Barbara Schneider; Mark Reckase; Joseph Krajcik – Applied Measurement in Education, 2024
The current study expands on previous research on gender differences and similarities in science test scores. Using three different approaches -- differential item functioning, differential distractor functioning, and decision tree analysis -- we examine a high school science assessment administered to 3,849 10th-12th graders, of whom 2,021 are…
Descriptors: Gender Differences, Science Achievement, Responses, Testing
Abu-Ghazalah, Rashid M.; Dubins, David N.; Poon, Gregory M. K. – Applied Measurement in Education, 2023
Multiple choice results are inherently probabilistic outcomes, as correct responses reflect a combination of knowledge and guessing, while incorrect responses additionally reflect blunder, a confidently committed mistake. To objectively resolve knowledge from responses in an MC test structure, we evaluated probabilistic models that explicitly…
Descriptors: Guessing (Tests), Multiple Choice Tests, Probability, Models
Wells, Craig S.; Sireci, Stephen G. – Applied Measurement in Education, 2020
Student growth percentiles (SGPs) are currently used by several states and school districts to provide information about individual students as well as to evaluate teachers, schools, and school districts. For SGPs to be defensible for these purposes, they should be reliable. In this study, we examine the amount of systematic and random error in…
Descriptors: Growth Models, Reliability, Scores, Error Patterns
Schmidgall, Jonathan – Applied Measurement in Education, 2017
This study utilizes an argument-based approach to validation to examine the implications of reliability in order to further differentiate the concepts of score and decision consistency. In a methodological example, the framework of generalizability theory was used to estimate appropriate indices of score consistency and evaluations of the…
Descriptors: Scores, Reliability, Validity, Generalizability Theory
Keller, Lisa A.; Keller, Robert R. – Applied Measurement in Education, 2015
Equating test forms is an essential activity in standardized testing, with increased importance with the accountability systems in existence through the mandate of Adequate Yearly Progress. It is through equating that scores from different test forms become comparable, which allows for the tracking of changes in the performance of students from…
Descriptors: Item Response Theory, Rating Scales, Standardized Tests, Scoring Rubrics
Bonner, Sarah M.; D'Agostino, Jerome V. – Applied Measurement in Education, 2012
We investigated examinees' cognitive processes while they solved selected items from the Multistate Bar Exam (MBE), a high-stakes professional certification examination. We focused on ascertaining those mental processes most frequently used by examinees, and the most common types of errors in their thinking. We compared the relationships between…
Descriptors: Cognitive Processes, Test Items, Problem Solving, Thinking Skills
Noble, Tracy; Rosebery, Ann; Suarez, Catherine; Warren, Beth; O'Connor, Mary Catherine – Applied Measurement in Education, 2014
English language learners (ELLs) and their teachers, schools, and communities face increasingly high-stakes consequences due to test score gaps between ELLs and non-ELLs. It is essential that the field of educational assessment continue to investigate the meaning of these test score gaps. This article discusses the findings of an exploratory study…
Descriptors: English Language Learners, Evidence, Educational Assessment, Achievement Gap
Puhan, Gautam – Applied Measurement in Education, 2009
The purpose of this study is to determine the extent of scale drift on a test that employs cut scores. It was essential to examine scale drift for this testing program because new forms in this testing program are often put on scale through a series of intermediate equatings (known as equating chains). This process may cause equating error to…
Descriptors: Testing Programs, Testing, Measurement Techniques, Item Response Theory
Wells, Craig S.; Bolt, Daniel M. – Applied Measurement in Education, 2008
Tests of model misfit are often performed to validate the use of a particular model in item response theory. Douglas and Cohen (2001) introduced a general nonparametric approach for detecting misfit under the two-parameter logistic model. However, the statistical properties of their approach, and empirical comparisons to other methods, have not…
Descriptors: Test Length, Test Items, Monte Carlo Methods, Nonparametric Statistics
D'Agostino, Jerome V.; Welsh, Megan E.; Cimetta, Adriana D.; Falco, Lia D.; Smith, Shannon; VanWinkle, Waverely Hester; Powers, Sonya J. – Applied Measurement in Education, 2008
Central to the standards-based assessment validation process is an examination of the alignment between state standards and test items. Several alignment analysis systems have emerged recently, but most rely on either traditional rating or matching techniques. Little, if any, analyses have been reported on the degree of consistency between the two…
Descriptors: Test Items, Student Evaluation, State Standards, Evaluation Methods
Wollack, James A. – Applied Measurement in Education, 2006
Many of the currently available statistical indexes to detect answer copying lack sufficient power at small [alpha] levels or when the amount of copying is relatively small. Furthermore, there is no one index that is uniformly best. Depending on the type or amount of copying, certain indexes are better than others. The purpose of this article was…
Descriptors: Statistical Analysis, Item Analysis, Test Length, Sample Size