ERIC - Search Results

Publication Date

In 2025	0
Since 2024	0
Since 2021 (last 5 years)	1
Since 2016 (last 10 years)	4
Since 2006 (last 20 years)	19

Descriptor

Difficulty Level	35
Scaling	35
Test Items	35
Item Response Theory	17
Test Construction	14
Item Analysis	9
Item Banks	8
Mathematics Tests	8
Statistical Analysis	8
Academic Standards	7
Benchmarking	7
Equated Scores	7
Mathematical Models	7
Alignment (Education)	6
Curriculum Based Assessment	6
Elementary School Mathematics	6
Formative Evaluation	6
Progress Monitoring	6
Response to Intervention	6
State Standards	6
Comparative Analysis	5
Foreign Countries	5
Scores	5
Testing Problems	5
Elementary Secondary Education	4
More ▼

Source

Behavioral Research and…	6
Educational and Psychological…	4
ETS Research Report Series	2
Applied Measurement in…	1
Applied Psychological…	1
Assessment for Effective…	1
Focus	1
International Journal of…	1
Journal of Research in…	1
Ministerial Council on…	1
Pearson	1
Scandinavian Journal of…	1
More ▼

Publication Type

Reports - Research	21
Journal Articles	12
Reports - Evaluative	11
Speeches/Meeting Papers	10
Numerical/Quantitative Data	8
Information Analyses	2
Reports - Descriptive	2
Collected Works - Serials	1
Guides - General	1

Education Level

Elementary Education	9
Early Childhood Education	5
Primary Education	5
Elementary Secondary Education	4
Grade 3	3
Grade 5	3
Secondary Education	3
Grade 1	2
Grade 2	2
Grade 4	2
Grade 6	2
Grade 7	2
Intermediate Grades	2
Kindergarten	2
High Schools	1
Higher Education	1
Postsecondary Education	1
More ▼

Audience

Researchers

Location

Australia	2
Florida	1
Germany	1
Netherlands	1

Laws, Policies, & Programs

Assessments and Surveys

Program for International…	2
SAT (College Admission Test)	1

What Works Clearinghouse Rating

Showing 1 to 15 of 35 results Save | Export

Efficient Estimation of Mean Ability Growth Using Vertical Scaling

Peer reviewed

Direct link

Bjermo, Jonas; Miller, Frank – Applied Measurement in Education, 2021

In recent years, the interest in measuring growth in student ability in various subjects between different grades in school has increased. Therefore, good precision in the estimated growth is of importance. This paper aims to compare estimation methods and test designs when it comes to precision and bias of the estimated growth of mean ability…

Descriptors: Scaling, Ability, Computation, Test Items

Unidimensional Vertical Scaling in Multidimensional Space. Research Report. ETS RR-17-29

Peer reviewed
PDF on ERIC

Download full text

Carlson, James E. – ETS Research Report Series, 2017

In this paper, I consider a set of test items that are located in a multidimensional space, S[subscript M], but are located along a curved line in S[subscript M] and can be scaled unidimensionally. Furthermore, I am demonstrating a case in which the test items are administered across 6 levels, such as occurs in K-12 assessment across 6 grade…

Descriptors: Test Items, Item Response Theory, Difficulty Level, Scoring

Controlling Guessing Bias in the Dichotomous Rasch Model Applied to a Large-Scale, Vertically Scaled Testing Program

Peer reviewed

Direct link

Andrich, David; Marais, Ida; Humphry, Stephen Mark – Educational and Psychological Measurement, 2016

Recent research has shown how the statistical bias in Rasch model difficulty estimates induced by guessing in multiple-choice items can be eliminated. Using vertical scaling of a high-profile national reading test, it is shown that the dominant effect of removing such bias is a nonlinear change in the unit of scale across the continuum. The…

Descriptors: Guessing (Tests), Statistical Bias, Item Response Theory, Multiple Choice Tests

Estimating Item Difficulty with Comparative Judgments. Research Report. ETS RR-14-39

Peer reviewed
PDF on ERIC

Download full text

Attali, Yigal; Saldivia, Luis; Jackson, Carol; Schuppan, Fred; Wanamaker, Wilbur – ETS Research Report Series, 2014

Previous investigations of the ability of content experts and test developers to estimate item difficulty have, for themost part, produced disappointing results. These investigations were based on a noncomparative method of independently rating the difficulty of items. In this article, we argue that, by eliciting comparative judgments of…

Descriptors: Test Items, Difficulty Level, Comparative Analysis, College Entrance Examinations

Maximum Likelihood Item Easiness Models for Test Theory without an Answer Key

Peer reviewed

Direct link

France, Stephen L.; Batchelder, William H. – Educational and Psychological Measurement, 2015

Cultural consensus theory (CCT) is a data aggregation technique with many applications in the social and behavioral sciences. We describe the intuition and theory behind a set of CCT models for continuous type data using maximum likelihood inference methodology. We describe how bias parameters can be incorporated into these models. We introduce…

Descriptors: Maximum Likelihood Statistics, Test Items, Difficulty Level, Test Theory

Lessons Learned from PISA: A Systematic Review of Peer-Reviewed Articles on the Programme for International Student Assessment

Peer reviewed

Direct link

Hopfenbeck, Therese N.; Lenkeit, Jenny; El Masri, Yasmine; Cantrell, Kate; Ryan, Jeanne; Baird, Jo-Anne – Scandinavian Journal of Educational Research, 2018

International large-scale assessments are on the rise, with the Programme for International Student Assessment (PISA) seen by many as having strategic prominence in education policy debates. The present article reviews PISA-related English-language peer-reviewed articles from the programme's first cycle in 2000 to its most current in 2015. Five…

Descriptors: Foreign Countries, Achievement Tests, International Assessment, Secondary School Students

Effects of Item Parameter Drift on Vertical Scaling with the Nonequivalent Groups with Anchor Test (NEAT) Design

Peer reviewed

Direct link

Ye, Meng; Xin, Tao – Educational and Psychological Measurement, 2014

The authors explored the effects of drifting common items on vertical scaling within the higher order framework of item parameter drift (IPD). The results showed that if IPD occurred between a pair of test levels, the scaling performance started to deviate from the ideal state, as indicated by bias of scaling. When there were two items drifting…

Descriptors: Scaling, Test Items, Equated Scores, Achievement Gains

Assessment of Uncertainty-Infused Scientific Argumentation

Peer reviewed

Direct link

Lee, Hee-Sun; Liu, Ou Lydia; Pallant, Amy; Roohr, Katrina Crotts; Pryputniewicz, Sarah; Buck, Zoë E. – Journal of Research in Science Teaching, 2014

Though addressing sources of uncertainty is an important part of doing science, it has largely been neglected in assessing students' scientific argumentation. In this study, we initially defined a scientific argumentation construct in four structural elements consisting of claim, justification, uncertainty qualifier, and uncertainty…

Descriptors: Persuasive Discourse, Student Evaluation, High School Students, Science Tests

Applying Rasch Model and Generalizability Theory to Study Modified-Angoff Cut Scores

Peer reviewed

Direct link

Arce, Alvaro J.; Wang, Ze – International Journal of Testing, 2012

The traditional approach to scale modified-Angoff cut scores transfers the raw cuts to an existing raw-to-scale score conversion table. Under the traditional approach, cut scores and conversion table raw scores are not only seen as interchangeable but also as originating from a common scaling process. In this article, we propose an alternative…

Descriptors: Generalizability Theory, Item Response Theory, Cutting Scores, Scaling

An Application of Explanatory Item Response Modeling for Model-Based Proficiency Scaling

Peer reviewed

Direct link

Hartig, Johannes; Frey, Andreas; Nold, Gunter; Klieme, Eckhard – Educational and Psychological Measurement, 2012

The article compares three different methods to estimate effects of task characteristics and to use these estimates for model-based proficiency scaling: prediction of item difficulties from the Rasch model, the linear logistic test model (LLTM), and an LLTM including random item effects (LLTM+e). The methods are applied to empirical data from a…

Descriptors: Item Response Theory, Models, Methods, Computation

The Development and Scaling of the easyCBM CCSS Elementary Mathematics Measures: Grade K. Technical Report #1314

Download full text

Irvin, P. Shawn; Saven, Jessica L.; Alonzo, Julie; Park, Bitnara Jasmine; Anderson, Daniel; Tindal, Gerald – Behavioral Research and Teaching, 2012

The results of formative assessments are regularly used to inform important instructional decisions (e.g., targeted intervention) within a response to intervention (RTI) system of teaching and learning. The validity of such instructional decision-making depends, in part, on the alignment between formative measures and the academic content…