Publication Date
In 2025 | 0 |
Since 2024 | 2 |
Since 2021 (last 5 years) | 6 |
Since 2016 (last 10 years) | 12 |
Since 2006 (last 20 years) | 42 |
Descriptor
Test Items | 10 |
Item Response Theory | 9 |
Test Construction | 9 |
Testing Programs | 9 |
Advanced Placement Programs | 8 |
Mathematics Tests | 8 |
Scores | 8 |
Test Validity | 8 |
Evaluation Methods | 7 |
Evidence | 7 |
Program Effectiveness | 7 |
More ▼ |
Source
Applied Measurement in… | 42 |
Author
Ayala, Carlos C. | 3 |
Brandon, Paul R. | 3 |
Furtak, Erin Marie | 3 |
Huff, Kristen | 3 |
Plake, Barbara S. | 3 |
Ruiz-Primo, Maria Araceli | 3 |
Shavelson, Richard J. | 3 |
Tomita, Miki K. | 3 |
Yin, Yue | 3 |
Young, Donald B. | 3 |
Lee, Won-Chan | 2 |
More ▼ |
Publication Type
Journal Articles | 42 |
Reports - Research | 23 |
Reports - Evaluative | 11 |
Reports - Descriptive | 8 |
Information Analyses | 1 |
Education Level
Secondary Education | 11 |
High Schools | 7 |
Elementary Secondary Education | 6 |
Elementary Education | 5 |
Grade 3 | 3 |
Grade 5 | 3 |
Grade 6 | 3 |
Higher Education | 3 |
Middle Schools | 3 |
Early Childhood Education | 2 |
Grade 4 | 2 |
More ▼ |
Audience
Researchers | 1 |
Laws, Policies, & Programs
No Child Left Behind Act 2001 | 2 |
Assessments and Surveys
Advanced Placement… | 1 |
Georgia Criterion Referenced… | 1 |
Massachusetts Comprehensive… | 1 |
Program for International… | 1 |
What Works Clearinghouse Rating
Perez, Alexandra Lane; Evans, Carla – Applied Measurement in Education, 2023
New Hampshire's Performance Assessment of Competency Education (PACE) innovative assessment system uses student scores from classroom performance assessments as well as other classroom tests for school accountability purposes. One concern is that not having annual state testing may incentivize schools and teachers away from teaching the breadth of…
Descriptors: Grade 8, Competency Based Education, Evaluation Methods, Educational Innovation
Jennifer Randall; Joseph Rios – Applied Measurement in Education, 2023
Building on the extant literature on recruitment and retention within the field of STEM and undergraduate education, we sought to explore the recruitment and retention experiences of racially and ethnically minoritized students enrolled in graduate level assessment, measurement, and/or evaluation programs in the United States. Using a mixed…
Descriptors: Graduate Students, Measurement, Educational Research, Educational Researchers
Ben Backes; James Cowan – Applied Measurement in Education, 2024
We investigate two research questions using a recent statewide transition from paper to computer-based testing: first, the extent to which test mode effects found in prior studies can be eliminated; and second, the degree to which online and paper assessments offer different information about underlying student ability. We first find very small…
Descriptors: Computer Assisted Testing, Test Format, Differences, Academic Achievement
Traditional vs Intersectional DIF Analysis: Considerations and a Comparison Using State Testing Data
Tony Albano; Brian F. French; Thao Thu Vo – Applied Measurement in Education, 2024
Recent research has demonstrated an intersectional approach to the study of differential item functioning (DIF). This approach expands DIF to account for the interactions between what have traditionally been treated as separate grouping variables. In this paper, we compare traditional and intersectional DIF analyses using data from a state testing…
Descriptors: Test Items, Item Analysis, Data Use, Standardized Tests
Jonson, Jessica L. – Applied Measurement in Education, 2022
This article describes a grant project that generated a technical guide for PK-12 educators who are utilizing social and emotional learning (SEL) assessments for educational improvement purposes. The guide was developed over a two-year period with funding from the Spencer Foundation. The result was the collective contribution of a widely…
Descriptors: Measurement Techniques, Tests, Preschool Teachers, Kindergarten
Krupa, Erin Elizabeth; Carney, Michele; Bostic, Jonathan – Applied Measurement in Education, 2019
This article provides a brief introduction to the set of four articles in the special issue. To provide a foundation for the issue, key terms are defined, a brief historical overview of validity is provided, and a description of several different validation approaches used in the issue are explained. Finally, the contribution of the articles to…
Descriptors: Test Items, Program Validation, Test Validity, Mathematics Education
Bjermo, Jonas; Miller, Frank – Applied Measurement in Education, 2021
In recent years, the interest in measuring growth in student ability in various subjects between different grades in school has increased. Therefore, good precision in the estimated growth is of importance. This paper aims to compare estimation methods and test designs when it comes to precision and bias of the estimated growth of mean ability…
Descriptors: Scaling, Ability, Computation, Test Items
Kim, Stella Y.; Lee, Won-Chan – Applied Measurement in Education, 2019
This study explores classification consistency and accuracy for mixed-format tests using real and simulated data. In particular, the current study compares six methods of estimating classification consistency and accuracy for seven mixed-format tests. The relative performance of the estimation methods is evaluated using simulated data. Study…
Descriptors: Classification, Reliability, Accuracy, Test Format
Keller, Lisa A.; Keller, Robert; Cook, Robert J.; Colvin, Kimberly F. – Applied Measurement in Education, 2016
The equating of tests is an essential process in high-stakes, large-scale testing conducted over multiple forms or administrations. By adjusting for differences in difficulty and placing scores from different administrations of a test on a common scale, equating allows scores from these different forms and administrations to be directly compared…
Descriptors: Item Response Theory, Equated Scores, Test Format, Testing Programs
Cohen, Dale J.; Zhang, Jin; Wothke, Werner – Applied Measurement in Education, 2019
Construct-irrelevant cognitive complexity of some items in the statewide grade-level assessments may impose performance barriers for students with disabilities who are ineligible for alternate assessments based on alternate achievement standards. This has spurred research into whether items can be modified to reduce complexity without affecting…
Descriptors: Test Items, Accessibility (for Disabled), Students with Disabilities, Low Achievement
Lottridge, Susan; Wood, Scott; Shaw, Dan – Applied Measurement in Education, 2018
This study sought to provide a framework for evaluating machine score-ability of items using a new score-ability rating scale, and to determine the extent to which ratings were predictive of observed automated scoring performance. The study listed and described a set of factors that are thought to influence machine score-ability; these factors…
Descriptors: Program Effectiveness, Computer Assisted Testing, Test Scoring Machines, Scoring
Phillips, Gary W. – Applied Measurement in Education, 2015
This article proposes that sampling design effects have potentially huge unrecognized impacts on the results reported by large-scale district and state assessments in the United States. When design effects are unrecognized and unaccounted for they lead to underestimating the sampling error in item and test statistics. Underestimating the sampling…
Descriptors: State Programs, Sampling, Research Design, Error of Measurement
Lee, Guemin; Lee, Won-Chan – Applied Measurement in Education, 2016
The main purposes of this study were to develop bi-factor multidimensional item response theory (BF-MIRT) observed-score equating procedures for mixed-format tests and to investigate relative appropriateness of the proposed procedures. Using data from a large-scale testing program, three types of pseudo data sets were formulated: matched samples,…
Descriptors: Test Format, Multidimensional Scaling, Item Response Theory, Equated Scores
Powers, Donald E.; Escoffery, David S.; Duchnowski, Matthew P. – Applied Measurement in Education, 2015
By far, the most frequently used method of validating (the interpretation and use of) automated essay scores has been to compare them with scores awarded by human raters. Although this practice is questionable, human-machine agreement is still often regarded as the "gold standard." Our objective was to refine this model and apply it to…
Descriptors: Essays, Test Scoring Machines, Program Validation, Criterion Referenced Tests
Buckendahl, Chad W.; Plake, Barbara S.; Davis, Susan L. – Applied Measurement in Education, 2009
The National Assessment of Educational Progress (NAEP) program is a series of periodic assessments administered nationally to samples of students and designed to measure different content areas. This article describes a multi-year study that focused on the breadth of the development, administration, maintenance, and renewal of the assessments in…
Descriptors: National Competency Tests, Audits (Verification), Testing Programs, Program Evaluation