Publication Date
In 2025 | 0 |
Since 2024 | 1 |
Since 2021 (last 5 years) | 7 |
Since 2016 (last 10 years) | 14 |
Since 2006 (last 20 years) | 22 |
Descriptor
Classification | 22 |
Test Items | 10 |
Accuracy | 8 |
Computation | 6 |
Item Response Theory | 5 |
Measurement | 5 |
Scores | 5 |
Guessing (Tests) | 4 |
Test Bias | 4 |
Achievement | 3 |
Advanced Placement Programs | 3 |
More ▼ |
Source
Applied Measurement in… | 22 |
Author
Huff, Kristen | 2 |
Penfield, Randall D. | 2 |
Rios, Joseph A. | 2 |
Sireci, Stephen G. | 2 |
Wells, Craig S. | 2 |
Abulela, Mohammed A. A. | 1 |
Alahmadi, Sarah | 1 |
Alvarez, Karina | 1 |
Baldwin, Su | 1 |
Barry, Carol L. | 1 |
Béguin, Anton A. | 1 |
More ▼ |
Publication Type
Journal Articles | 22 |
Reports - Research | 19 |
Reports - Evaluative | 3 |
Tests/Questionnaires | 3 |
Education Level
Secondary Education | 5 |
Elementary Education | 4 |
Middle Schools | 4 |
Grade 4 | 3 |
Grade 8 | 3 |
High Schools | 3 |
Early Childhood Education | 2 |
Elementary Secondary Education | 2 |
Grade 3 | 2 |
Grade 5 | 2 |
Grade 6 | 2 |
More ▼ |
Audience
Location
New York | 2 |
California | 1 |
Florida | 1 |
Iran (Tehran) | 1 |
Massachusetts | 1 |
Netherlands | 1 |
North Carolina | 1 |
Oklahoma | 1 |
Texas | 1 |
Laws, Policies, & Programs
Assessments and Surveys
National Assessment of… | 2 |
Program for International… | 2 |
Advanced Placement… | 1 |
Measures of Academic Progress | 1 |
Progress in International… | 1 |
Trends in International… | 1 |
What Works Clearinghouse Rating
Rios, Joseph – Applied Measurement in Education, 2022
To mitigate the deleterious effects of rapid guessing (RG) on ability estimates, several rescoring procedures have been proposed. Underlying many of these procedures is the assumption that RG is accurately identified. At present, there have been minimal investigations examining the utility of rescoring approaches when RG is misclassified, and…
Descriptors: Accuracy, Guessing (Tests), Scoring, Classification
Zhan, Peida; Liu, Yaohui; Yu, Zhaohui; Pan, Yanfang – Applied Measurement in Education, 2023
Many educational and psychological studies have shown that the development of students is generally step-by-step (i.e. ordinal development) to a specific level. This study proposed a novel longitudinal learning diagnosis model with polytomous attributes to track students' ordinal development in learning. Using the concept of polytomous attributes…
Descriptors: Skill Development, Cognitive Measurement, Models, Educational Diagnosis
Rios, Joseph A. – Applied Measurement in Education, 2022
Testing programs are confronted with the decision of whether to report individual scores for examinees that have engaged in rapid guessing (RG). As noted by the "Standards for Educational and Psychological Testing," this decision should be based on a documented criterion that determines score exclusion. To this end, a number of heuristic…
Descriptors: Testing, Guessing (Tests), Academic Ability, Scores
Alahmadi, Sarah; Jones, Andrew T.; Barry, Carol L.; Ibáñez, Beatriz – Applied Measurement in Education, 2023
Rasch common-item equating is often used in high-stakes testing to maintain equivalent passing standards across test administrations. If unaddressed, item parameter drift poses a major threat to the accuracy of Rasch common-item equating. We compared the performance of well-established and newly developed drift detection methods in small and large…
Descriptors: Equated Scores, Item Response Theory, Sample Size, Test Items
Hamdollah Ravand; Farshad Effatpanah; Wenchao Ma; Jimmy de la Torre; Purya Baghaei; Olga Kunina-Habenicht – Applied Measurement in Education, 2024
The purpose of this study was to explore the nature of interactions among second/foreign language (L2) writing subskills. Two types of relationships were investigated: subskill-item and subskill-subskill relationships. To achieve the first purpose, using writing data obtained from the writing essays of 500 English as a foreign language (EFL)…
Descriptors: Second Language Learning, Writing Instruction, Writing Skills, Writing Tests
Furter, Robert T.; Dwyer, Andrew C. – Applied Measurement in Education, 2020
Maintaining equivalent performance standards across forms is a psychometric challenge exacerbated by small samples. In this study, the accuracy of two equating methods (Rasch anchored calibration and nominal weights mean) and four anchor item selection methods were investigated in the context of very small samples (N = 10). Overall, nominal…
Descriptors: Classification, Accuracy, Item Response Theory, Equated Scores
Thompson, W. Jake; Clark, Amy K.; Nash, Brooke – Applied Measurement in Education, 2019
As the use of diagnostic assessment systems transitions from research applications to large-scale assessments for accountability purposes, reliability methods that provide evidence at each level of reporting are needed. The purpose of this paper is to summarize one simulation-based method for estimating and reporting reliability for an…
Descriptors: Test Reliability, Diagnostic Tests, Classification, Computation
Wells, Craig S.; Sireci, Stephen G. – Applied Measurement in Education, 2020
Student growth percentiles (SGPs) are currently used by several states and school districts to provide information about individual students as well as to evaluate teachers, schools, and school districts. For SGPs to be defensible for these purposes, they should be reliable. In this study, we examine the amount of systematic and random error in…
Descriptors: Growth Models, Reliability, Scores, Error Patterns
Kim, Stella Y.; Lee, Won-Chan – Applied Measurement in Education, 2019
This study explores classification consistency and accuracy for mixed-format tests using real and simulated data. In particular, the current study compares six methods of estimating classification consistency and accuracy for seven mixed-format tests. The relative performance of the estimation methods is evaluated using simulated data. Study…
Descriptors: Classification, Reliability, Accuracy, Test Format
Sinharay, Sandip; Zhang, Mo; Deane, Paul – Applied Measurement in Education, 2019
Analysis of keystroke logging data is of increasing interest, as evident from a substantial amount of recent research on the topic. Some of the research on keystroke logging data has focused on the prediction of essay scores from keystroke logging features, but linear regression is the only prediction method that has been used in this research.…
Descriptors: Scores, Prediction, Writing Processes, Data Analysis
Abulela, Mohammed A. A.; Rios, Joseph A. – Applied Measurement in Education, 2022
When there are no personal consequences associated with test performance for examinees, rapid guessing (RG) is a concern and can differ between subgroups. To date, the impact of differential RG on item-level measurement invariance has received minimal attention. To that end, a simulation study was conducted to examine the robustness of the…
Descriptors: Comparative Analysis, Robustness (Statistics), Nonparametric Statistics, Item Analysis
Wise, Steven L.; Kingsbury, G. Gage – Applied Measurement in Education, 2022
In achievement testing we assume that students will demonstrate their maximum performance as they encounter test items. Sometimes, however, student performance can decline during a test event, which implies that the test score does not represent maximum performance. This study describes a method for identifying significant performance decline and…
Descriptors: Achievement Tests, Performance, Classification, Guessing (Tests)
Ng, Hui Leng; Koretz, Daniel – Applied Measurement in Education, 2015
Policymakers usually leave decisions about scaling the scores used for accountability to their appointed technical advisory committees and the testing contractors. However, scaling decisions can have an appreciable impact on school ratings. Using middle-school data from New York State, we examined the consistency of school ratings based on two…
Descriptors: School Effectiveness, Scaling, Middle Schools, Accountability
Tay-lim, Brenda Siok-Hoon; Zhang, Jinming – Applied Measurement in Education, 2015
To ensure the statistical result validity, model-data fit must be evaluated for each item. In practice, certain actions or treatments are needed for misfit items. If all misfit items are treated, much item information would be lost during calibration. On the other hand, if only severely misfit items are treated, the inclusion of misfit items may…
Descriptors: Test Items, Goodness of Fit, Classification, Item Response Theory
Fagginger Auer, Marije F.; Hickendorff, Marian; Van Putten, Cornelis M.; Béguin, Anton A.; Heiser, Willem J. – Applied Measurement in Education, 2016
A first application of multilevel latent class analysis (MLCA) to educational large-scale assessment data is demonstrated. This statistical technique addresses several of the challenges that assessment data offers. Importantly, MLCA allows modeling of the often ignored teacher effects and of the joint influence of teacher and student variables.…
Descriptors: Educational Assessment, Multivariate Analysis, Classification, Data
Previous Page | Next Page »
Pages: 1 | 2