ERIC - Search Results

Publication Date

In 2025	0
Since 2024	1
Since 2021 (last 5 years)	7
Since 2016 (last 10 years)	14
Since 2006 (last 20 years)	22

Descriptor

Classification	22
Test Items	10
Accuracy	8
Computation	6
Item Response Theory	5
Measurement	5
Scores	5
Guessing (Tests)	4
Test Bias	4
Achievement	3
Advanced Placement Programs	3
Comparative Analysis	3
Cutting Scores	3
Equated Scores	3
Foreign Countries	3
Grade 4	3
Grade 8	3
Mathematics Tests	3
Models	3
Regression (Statistics)	3
Sample Size	3
Simulation	3
Achievement Tests	2
Bayesian Statistics	2
Diagnostic Tests	2
More ▼

Source

Applied Measurement in…

Publication Type

Journal Articles	22
Reports - Research	19
Reports - Evaluative	3
Tests/Questionnaires	3

Education Level

Secondary Education	5
Elementary Education	4
Middle Schools	4
Grade 4	3
Grade 8	3
High Schools	3
Early Childhood Education	2
Elementary Secondary Education	2
Grade 3	2
Grade 5	2
Grade 6	2
Intermediate Grades	2
Junior High Schools	2
Primary Education	2
Grade 2	1
Grade 7	1
Grade 9	1
Higher Education	1
Postsecondary Education	1
More ▼

Audience

Location

New York	2
California	1
Florida	1
Iran (Tehran)	1
Massachusetts	1
Netherlands	1
North Carolina	1
Oklahoma	1
Texas	1

Laws, Policies, & Programs

Assessments and Surveys

National Assessment of…	2
Program for International…	2
Advanced Placement…	1
Measures of Academic Progress	1
Progress in International…	1
Trends in International…	1

What Works Clearinghouse Rating

Showing 1 to 15 of 22 results Save | Export

An Examination of Individual Ability Estimation and Classification Accuracy under Rapid Guessing Misidentifications

Peer reviewed

Direct link

Rios, Joseph – Applied Measurement in Education, 2022

To mitigate the deleterious effects of rapid guessing (RG) on ability estimates, several rescoring procedures have been proposed. Underlying many of these procedures is the assumption that RG is accurately identified. At present, there have been minimal investigations examining the utility of rescoring approaches when RG is misclassified, and…

Descriptors: Accuracy, Guessing (Tests), Scoring, Classification

Tracking Ordinal Development of Skills with a Longitudinal DINA Model with Polytomous Attributes

Peer reviewed

Direct link

Zhan, Peida; Liu, Yaohui; Yu, Zhaohui; Pan, Yanfang – Applied Measurement in Education, 2023

Many educational and psychological studies have shown that the development of students is generally step-by-step (i.e. ordinal development) to a specific level. This study proposed a novel longitudinal learning diagnosis model with polytomous attributes to track students' ordinal development in learning. Using the concept of polytomous attributes…

Descriptors: Skill Development, Cognitive Measurement, Models, Educational Diagnosis

When Should Individual Ability Estimates Be Reported if Rapid Guessing Is Present?

Peer reviewed

Direct link

Rios, Joseph A. – Applied Measurement in Education, 2022

Testing programs are confronted with the decision of whether to report individual scores for examinees that have engaged in rapid guessing (RG). As noted by the "Standards for Educational and Psychological Testing," this decision should be based on a documented criterion that determines score exclusion. To this end, a number of heuristic…

Descriptors: Testing, Guessing (Tests), Academic Ability, Scores

Comparing Drift Detection Methods for Accurate Rasch Equating in Different Sample Sizes

Peer reviewed

Direct link

Alahmadi, Sarah; Jones, Andrew T.; Barry, Carol L.; Ibáñez, Beatriz – Applied Measurement in Education, 2023

Rasch common-item equating is often used in high-stakes testing to maintain equivalent passing standards across test administrations. If unaddressed, item parameter drift poses a major threat to the accuracy of Rasch common-item equating. We compared the performance of well-established and newly developed drift detection methods in small and large…

Descriptors: Equated Scores, Item Response Theory, Sample Size, Test Items

Exploring Interrelationships among L2 Writing Subskills: Insights from Cognitive Diagnostic Models

Peer reviewed

Direct link

Hamdollah Ravand; Farshad Effatpanah; Wenchao Ma; Jimmy de la Torre; Purya Baghaei; Olga Kunina-Habenicht – Applied Measurement in Education, 2024

The purpose of this study was to explore the nature of interactions among second/foreign language (L2) writing subskills. Two types of relationships were investigated: subskill-item and subskill-subskill relationships. To achieve the first purpose, using writing data obtained from the writing essays of 500 English as a foreign language (EFL)…

Descriptors: Second Language Learning, Writing Instruction, Writing Skills, Writing Tests

Investigating the Classification Accuracy of Rasch and Nominal Weights Mean Equating with Very Small Samples

Peer reviewed

Direct link

Furter, Robert T.; Dwyer, Andrew C. – Applied Measurement in Education, 2020

Maintaining equivalent performance standards across forms is a psychometric challenge exacerbated by small samples. In this study, the accuracy of two equating methods (Rasch anchored calibration and nominal weights mean) and four anchor item selection methods were investigated in the context of very small samples (N = 10). Overall, nominal…

Descriptors: Classification, Accuracy, Item Response Theory, Equated Scores

Measuring the Reliability of Diagnostic Mastery Classifications at Multiple Levels of Reporting

Peer reviewed

Direct link

Thompson, W. Jake; Clark, Amy K.; Nash, Brooke – Applied Measurement in Education, 2019

As the use of diagnostic assessment systems transitions from research applications to large-scale assessments for accountability purposes, reliability methods that provide evidence at each level of reporting are needed. The purpose of this paper is to summarize one simulation-based method for estimating and reporting reliability for an…

Descriptors: Test Reliability, Diagnostic Tests, Classification, Computation

Evaluating Random and Systematic Error in Student Growth Percentiles

Peer reviewed

Direct link

Wells, Craig S.; Sireci, Stephen G. – Applied Measurement in Education, 2020

Student growth percentiles (SGPs) are currently used by several states and school districts to provide information about individual students as well as to evaluate teachers, schools, and school districts. For SGPs to be defensible for these purposes, they should be reliable. In this study, we examine the amount of systematic and random error in…

Descriptors: Growth Models, Reliability, Scores, Error Patterns

Classification Consistency and Accuracy for Mixed-Format Tests

Peer reviewed

Direct link

Kim, Stella Y.; Lee, Won-Chan – Applied Measurement in Education, 2019

This study explores classification consistency and accuracy for mixed-format tests using real and simulated data. In particular, the current study compares six methods of estimating classification consistency and accuracy for seven mixed-format tests. The relative performance of the estimation methods is evaluated using simulated data. Study…

Descriptors: Classification, Reliability, Accuracy, Test Format

Prediction of Essay Scores from Writing Process and Product Features Using Data Mining Methods

Peer reviewed

Direct link

Sinharay, Sandip; Zhang, Mo; Deane, Paul – Applied Measurement in Education, 2019

Analysis of keystroke logging data is of increasing interest, as evident from a substantial amount of recent research on the topic. Some of the research on keystroke logging data has focused on the prediction of essay scores from keystroke logging features, but linear regression is the only prediction method that has been used in this research.…

Descriptors: Scores, Prediction, Writing Processes, Data Analysis

Comparing the Robustness of Three Nonparametric DIF Procedures to Differential Rapid Guessing

Peer reviewed

Direct link

Abulela, Mohammed A. A.; Rios, Joseph A. – Applied Measurement in Education, 2022

When there are no personal consequences associated with test performance for examinees, rapid guessing (RG) is a concern and can differ between subgroups. To date, the impact of differential RG on item-level measurement invariance has received minimal attention. To that end, a simulation study was conducted to examine the robustness of the…

Descriptors: Comparative Analysis, Robustness (Statistics), Nonparametric Statistics, Item Analysis

Performance Decline as an Indicator of Generalized Test-Taking Disengagement

Peer reviewed

Direct link

Wise, Steven L.; Kingsbury, G. Gage – Applied Measurement in Education, 2022

In achievement testing we assume that students will demonstrate their maximum performance as they encounter test items. Sometimes, however, student performance can decline during a test event, which implies that the test score does not represent maximum performance. This study describes a method for identifying significant performance decline and…

Descriptors: Achievement Tests, Performance, Classification, Guessing (Tests)

Sensitivity of School-Performance Ratings to Scaling Decisions

Peer reviewed

Direct link

Ng, Hui Leng; Koretz, Daniel – Applied Measurement in Education, 2015

Policymakers usually leave decisions about scaling the scores used for accountability to their appointed technical advisory committees and the testing contractors. However, scaling decisions can have an appreciable impact on school ratings. Using middle-school data from New York State, we examined the consistency of school ratings based on two…

Descriptors: School Effectiveness, Scaling, Middle Schools, Accountability

An Investigation of Different Treatment Strategies for Item Category Collapsing in Calibration: An Empirical Study

Peer reviewed

Direct link

Tay-lim, Brenda Siok-Hoon; Zhang, Jinming – Applied Measurement in Education, 2015

To ensure the statistical result validity, model-data fit must be evaluated for each item. In practice, certain actions or treatments are needed for misfit items. If all misfit items are treated, much item information would be lost during calibration. On the other hand, if only severely misfit items are treated, the inclusion of misfit items may…

Descriptors: Test Items, Goodness of Fit, Classification, Item Response Theory

Multilevel Latent Class Analysis for Large-Scale Educational Assessment Data: Exploring the Relation between the Curriculum and Students' Mathematical Strategies

Peer reviewed

Direct link

Fagginger Auer, Marije F.; Hickendorff, Marian; Van Putten, Cornelis M.; Béguin, Anton A.; Heiser, Willem J. – Applied Measurement in Education, 2016

A first application of multilevel latent class analysis (MLCA) to educational large-scale assessment data is demonstrated. This statistical technique addresses several of the challenges that assessment data offers. Importantly, MLCA allows modeling of the often ignored teacher effects and of the joint influence of teacher and student variables.…

Descriptors: Educational Assessment, Multivariate Analysis, Classification, Data

Previous Page | Next Page »

Pages: 1 | 2

Huff, Kristen	2
Penfield, Randall D.	2
Rios, Joseph A.	2
Sireci, Stephen G.	2
Wells, Craig S.	2
Abulela, Mohammed A. A.	1
Alahmadi, Sarah	1
Alvarez, Karina	1
Baldwin, Su	1
Barry, Carol L.	1
Béguin, Anton A.	1
Clark, Amy K.	1
Deane, Paul	1
Dwyer, Andrew C.	1
Fagginger Auer, Marije F.	1
Farshad Effatpanah	1
Furter, Robert T.	1
Hambleton, Ronald K.	1
Hamdollah Ravand	1
Heiser, Willem J.	1
Hendrickson, Amy	1
Hickendorff, Marian	1
Ibáñez, Beatriz	1
Jimmy de la Torre	1
Jirka, Stephen	1
More ▼