Publication Date
In 2025 | 0 |
Since 2024 | 3 |
Since 2021 (last 5 years) | 16 |
Since 2016 (last 10 years) | 38 |
Since 2006 (last 20 years) | 70 |
Descriptor
Scores | 133 |
Test Items | 36 |
Item Response Theory | 25 |
Test Construction | 24 |
Mathematics Tests | 20 |
Comparative Analysis | 18 |
Validity | 18 |
Test Results | 17 |
Scoring | 16 |
Models | 15 |
Reliability | 15 |
More ▼ |
Source
Applied Measurement in… | 133 |
Author
Hambleton, Ronald K. | 4 |
Sireci, Stephen G. | 4 |
Wise, Steven L. | 4 |
Carney, Michele | 3 |
Huff, Kristen | 3 |
Johnson, Robert L. | 3 |
Lane, Suzanne | 3 |
Linn, Robert L. | 3 |
Meijer, Rob R. | 3 |
Sackett, Paul R. | 3 |
Bridgeman, Brent | 2 |
More ▼ |
Publication Type
Journal Articles | 133 |
Reports - Research | 86 |
Reports - Evaluative | 42 |
Speeches/Meeting Papers | 6 |
Information Analyses | 4 |
Reports - Descriptive | 4 |
Tests/Questionnaires | 2 |
Book/Product Reviews | 1 |
Reports - General | 1 |
Education Level
Secondary Education | 14 |
High Schools | 13 |
Higher Education | 13 |
Elementary Education | 9 |
Postsecondary Education | 8 |
Grade 8 | 7 |
Elementary Secondary Education | 6 |
Middle Schools | 6 |
Grade 3 | 5 |
Grade 4 | 5 |
Junior High Schools | 5 |
More ▼ |
Audience
Location
Canada | 3 |
Arizona | 2 |
Georgia | 2 |
Vermont | 2 |
Virginia | 2 |
California | 1 |
California (Los Angeles) | 1 |
Europe | 1 |
Indiana | 1 |
Iran | 1 |
Kansas | 1 |
More ▼ |
Laws, Policies, & Programs
Assessments and Surveys
What Works Clearinghouse Rating
Sarah Alahmadi; Christine E. DeMars – Applied Measurement in Education, 2024
Large-scale educational assessments are sometimes considered low-stakes, increasing the possibility of confounding true performance level with low motivation. These concerns are amplified in remote testing conditions. To remove the effects of low effort levels in responses observed in remote low-stakes testing, several motivation filtering methods…
Descriptors: Multiple Choice Tests, Item Response Theory, College Students, Scores
John R. Donoghue; Carol Eckerly – Applied Measurement in Education, 2024
Trend scoring constructed response items (i.e. rescoring Time A responses at Time B) gives rise to two-way data that follow a product multinomial distribution rather than the multinomial distribution that is usually assumed. Recent work has shown that the difference in sampling model can have profound negative effects on statistics usually used to…
Descriptors: Scoring, Error of Measurement, Reliability, Scoring Rubrics
Rutkowski, David; Rutkowski, Leslie; Valdivia, Dubravka Svetina; Canbolat, Yusuf; Underhill, Stephanie – Applied Measurement in Education, 2023
Several states in the US have removed time limits on their state assessments. In Indiana, where this study takes place, the state assessment is both untimed during the testing window and allows unlimited breaks during the testing session. Using grade 3 and 8 math and English state assessment data, in this paper we focus on time used for testing…
Descriptors: Testing, Time, Intervals, Academic Achievement
Rios, Joseph A. – Applied Measurement in Education, 2022
Testing programs are confronted with the decision of whether to report individual scores for examinees that have engaged in rapid guessing (RG). As noted by the "Standards for Educational and Psychological Testing," this decision should be based on a documented criterion that determines score exclusion. To this end, a number of heuristic…
Descriptors: Testing, Guessing (Tests), Academic Ability, Scores
Carney, Michele; Paulding, Katie; Champion, Joe – Applied Measurement in Education, 2022
Teachers need ways to efficiently assess students' cognitive understanding. One promising approach involves easily adapted and administered item types that yield quantitative scores that can be interpreted in terms of whether or not students likely possess key understandings. This study illustrates an approach to analyzing response process…
Descriptors: Middle School Students, Logical Thinking, Mathematical Logic, Problem Solving
DeMars, Christine E. – Applied Measurement in Education, 2021
Estimation of parameters for the many-facets Rasch model requires that conditional on the values of the facets, such as person ability, item difficulty, and rater severity, the observed responses within each facet are independent. This requirement has often been discussed for the Rasch models and 2PL and 3PL models, but it becomes more complex…
Descriptors: Item Response Theory, Test Items, Ability, Scores
Song, Yoon Ah; Lee, Won-Chan – Applied Measurement in Education, 2022
This article presents the performance of item response theory (IRT) models when double ratings are used as item scores over single ratings when rater effects are present. Study 1 examined the influence of the number of ratings on the accuracy of proficiency estimation in the generalized partial credit model (GPCM). Study 2 compared the accuracy of…
Descriptors: Item Response Theory, Item Analysis, Scores, Accuracy
Clark, Amy K.; Nash, Brooke; Karvonen, Meagan – Applied Measurement in Education, 2022
Assessments scored with diagnostic models are increasingly popular because they provide fine-grained information about student achievement. Because of differences in how diagnostic assessments are scored and how results are used, the information teachers must know to interpret and use results may differ from concepts traditionally included in…
Descriptors: Elementary School Teachers, Secondary School Teachers, Assessment Literacy, Diagnostic Tests
Almehrizi, Rashid S. – Applied Measurement in Education, 2021
KR-21 reliability and its extension (coefficient [alpha]) gives the reliability estimate of test scores under the assumption of tau-equivalent forms. KR-21 reliability gives the reliability estimate for summed scores for dichotomous items when items are randomly sampled from an infinite pool of similar items (randomly parallel forms). The article…
Descriptors: Test Reliability, Scores, Scoring, Computation
Rios, Joseph – Applied Measurement in Education, 2021
Four decades of research have shown that students' low test-taking effort is a serious threat to the validity of score-based inferences from low-stakes, group-based educational assessments. This meta-analysis sought to identify effective interventions for improving students' test-taking effort in such contexts. Included studies: (1) used a…
Descriptors: Test Wiseness, Student Motivation, Meta Analysis, Intervention
Mo, Ya; Carney, Michele; Cavey, Laurie; Totorica, Tatia – Applied Measurement in Education, 2021
There is a need for assessment items that assess complex constructs but can also be efficiently scored for evaluation of teacher education programs. In an effort to measure the construct of teacher attentiveness in an efficient and scalable manner, we are using exemplar responses elicited by constructed-response item prompts to develop…
Descriptors: Protocol Analysis, Test Items, Responses, Mathematics Teachers
Sackett, Paul R.; Sharpe, Melissa S.; Kuncel, Nathan – Applied Measurement in Education, 2021
The literature is replete with references to a disproportionate reliance on admission test scores (e.g., the ACT or SAT) in the college admissions process. School-reported reliance on test scores and grades has been used to study this question, generally indicating relatively equal reliance on the two, with a slightly higher endorsement of grades.…
Descriptors: College Admission, Admission Criteria, College Entrance Examinations, College Applicants
Yiling Cheng; I-Chien Chen; Barbara Schneider; Mark Reckase; Joseph Krajcik – Applied Measurement in Education, 2024
The current study expands on previous research on gender differences and similarities in science test scores. Using three different approaches -- differential item functioning, differential distractor functioning, and decision tree analysis -- we examine a high school science assessment administered to 3,849 10th-12th graders, of whom 2,021 are…
Descriptors: Gender Differences, Science Achievement, Responses, Testing
Carney, Michele; Crawford, Angela; Siebert, Carl; Osguthorpe, Rich; Thiede, Keith – Applied Measurement in Education, 2019
The "Standards for Educational and Psychological Testing" recommend an argument-based approach to validation that involves a clear statement of the intended interpretation and use of test scores, the identification of the underlying assumptions and inferences in that statement--termed the interpretation/use argument, and gathering of…
Descriptors: Inquiry, Test Interpretation, Validity, Scores
Dahlke, Jeffrey A.; Sackett, Paul R.; Kuncel, Nathan R. – Applied Measurement in Education, 2023
We examine longitudinal data from 120,384 students who took a version of the PSAT/SAT in the 9th, 10th, 11th, and 12th grades. We investigate score changes over time and show that socioeconomic status (SES) is related to the degree of score improvement. We note that the 9th and 10th grade PSAT are low-stakes tests, while the operational SAT is a…
Descriptors: Scores, College Entrance Examinations, Socioeconomic Status, Test Preparation