Publication Date
In 2025 | 0 |
Since 2024 | 0 |
Since 2021 (last 5 years) | 1 |
Since 2016 (last 10 years) | 4 |
Since 2006 (last 20 years) | 19 |
Descriptor
Difficulty Level | 35 |
Scaling | 35 |
Test Items | 35 |
Item Response Theory | 17 |
Test Construction | 14 |
Item Analysis | 9 |
Item Banks | 8 |
Mathematics Tests | 8 |
Statistical Analysis | 8 |
Academic Standards | 7 |
Benchmarking | 7 |
More ▼ |
Source
Author
Publication Type
Reports - Research | 21 |
Journal Articles | 12 |
Reports - Evaluative | 11 |
Speeches/Meeting Papers | 10 |
Numerical/Quantitative Data | 8 |
Information Analyses | 2 |
Reports - Descriptive | 2 |
Collected Works - Serials | 1 |
Guides - General | 1 |
Education Level
Elementary Education | 9 |
Early Childhood Education | 5 |
Primary Education | 5 |
Elementary Secondary Education | 4 |
Grade 3 | 3 |
Grade 5 | 3 |
Secondary Education | 3 |
Grade 1 | 2 |
Grade 2 | 2 |
Grade 4 | 2 |
Grade 6 | 2 |
More ▼ |
Audience
Researchers | 2 |
Location
Australia | 2 |
Florida | 1 |
Germany | 1 |
Netherlands | 1 |
Laws, Policies, & Programs
Assessments and Surveys
Program for International… | 2 |
SAT (College Admission Test) | 1 |
What Works Clearinghouse Rating
Bjermo, Jonas; Miller, Frank – Applied Measurement in Education, 2021
In recent years, the interest in measuring growth in student ability in various subjects between different grades in school has increased. Therefore, good precision in the estimated growth is of importance. This paper aims to compare estimation methods and test designs when it comes to precision and bias of the estimated growth of mean ability…
Descriptors: Scaling, Ability, Computation, Test Items
Carlson, James E. – ETS Research Report Series, 2017
In this paper, I consider a set of test items that are located in a multidimensional space, S[subscript M], but are located along a curved line in S[subscript M] and can be scaled unidimensionally. Furthermore, I am demonstrating a case in which the test items are administered across 6 levels, such as occurs in K-12 assessment across 6 grade…
Descriptors: Test Items, Item Response Theory, Difficulty Level, Scoring
Andrich, David; Marais, Ida; Humphry, Stephen Mark – Educational and Psychological Measurement, 2016
Recent research has shown how the statistical bias in Rasch model difficulty estimates induced by guessing in multiple-choice items can be eliminated. Using vertical scaling of a high-profile national reading test, it is shown that the dominant effect of removing such bias is a nonlinear change in the unit of scale across the continuum. The…
Descriptors: Guessing (Tests), Statistical Bias, Item Response Theory, Multiple Choice Tests
Attali, Yigal; Saldivia, Luis; Jackson, Carol; Schuppan, Fred; Wanamaker, Wilbur – ETS Research Report Series, 2014
Previous investigations of the ability of content experts and test developers to estimate item difficulty have, for themost part, produced disappointing results. These investigations were based on a noncomparative method of independently rating the difficulty of items. In this article, we argue that, by eliciting comparative judgments of…
Descriptors: Test Items, Difficulty Level, Comparative Analysis, College Entrance Examinations
France, Stephen L.; Batchelder, William H. – Educational and Psychological Measurement, 2015
Cultural consensus theory (CCT) is a data aggregation technique with many applications in the social and behavioral sciences. We describe the intuition and theory behind a set of CCT models for continuous type data using maximum likelihood inference methodology. We describe how bias parameters can be incorporated into these models. We introduce…
Descriptors: Maximum Likelihood Statistics, Test Items, Difficulty Level, Test Theory
Hopfenbeck, Therese N.; Lenkeit, Jenny; El Masri, Yasmine; Cantrell, Kate; Ryan, Jeanne; Baird, Jo-Anne – Scandinavian Journal of Educational Research, 2018
International large-scale assessments are on the rise, with the Programme for International Student Assessment (PISA) seen by many as having strategic prominence in education policy debates. The present article reviews PISA-related English-language peer-reviewed articles from the programme's first cycle in 2000 to its most current in 2015. Five…
Descriptors: Foreign Countries, Achievement Tests, International Assessment, Secondary School Students
Ye, Meng; Xin, Tao – Educational and Psychological Measurement, 2014
The authors explored the effects of drifting common items on vertical scaling within the higher order framework of item parameter drift (IPD). The results showed that if IPD occurred between a pair of test levels, the scaling performance started to deviate from the ideal state, as indicated by bias of scaling. When there were two items drifting…
Descriptors: Scaling, Test Items, Equated Scores, Achievement Gains
Lee, Hee-Sun; Liu, Ou Lydia; Pallant, Amy; Roohr, Katrina Crotts; Pryputniewicz, Sarah; Buck, Zoë E. – Journal of Research in Science Teaching, 2014
Though addressing sources of uncertainty is an important part of doing science, it has largely been neglected in assessing students' scientific argumentation. In this study, we initially defined a scientific argumentation construct in four structural elements consisting of claim, justification, uncertainty qualifier, and uncertainty…
Descriptors: Persuasive Discourse, Student Evaluation, High School Students, Science Tests
Arce, Alvaro J.; Wang, Ze – International Journal of Testing, 2012
The traditional approach to scale modified-Angoff cut scores transfers the raw cuts to an existing raw-to-scale score conversion table. Under the traditional approach, cut scores and conversion table raw scores are not only seen as interchangeable but also as originating from a common scaling process. In this article, we propose an alternative…
Descriptors: Generalizability Theory, Item Response Theory, Cutting Scores, Scaling
Hartig, Johannes; Frey, Andreas; Nold, Gunter; Klieme, Eckhard – Educational and Psychological Measurement, 2012
The article compares three different methods to estimate effects of task characteristics and to use these estimates for model-based proficiency scaling: prediction of item difficulties from the Rasch model, the linear logistic test model (LLTM), and an LLTM including random item effects (LLTM+e). The methods are applied to empirical data from a…
Descriptors: Item Response Theory, Models, Methods, Computation
Irvin, P. Shawn; Saven, Jessica L.; Alonzo, Julie; Park, Bitnara Jasmine; Anderson, Daniel; Tindal, Gerald – Behavioral Research and Teaching, 2012
The results of formative assessments are regularly used to inform important instructional decisions (e.g., targeted intervention) within a response to intervention (RTI) system of teaching and learning. The validity of such instructional decision-making depends, in part, on the alignment between formative measures and the academic content…
Descriptors: Elementary School Mathematics, Curriculum Based Assessment, Mathematics Tests, Academic Standards
Irvin, P. Shawn; Saven, Jessica L.; Alonzo, Julie; Park, Bitnara Jasmine; Anderson, Daniel; Tindal, Gerald – Behavioral Research and Teaching, 2012
The results of formative assessments are regularly used to inform important instructional decisions (e.g., targeted intervention) within a response to intervention (RTI) system of teaching and learning. The validity of such instructional decision-making depends, in part, on the alignment between formative measures and the academic content…
Descriptors: Elementary School Mathematics, Curriculum Based Assessment, Mathematics Tests, Academic Standards
Saven, Jessica L.; Irvin, P. Shawn; Park, Bitnara Jasmine; Alonzo, Julie; Anderson, Daniel; Tindal, Gerald – Behavioral Research and Teaching, 2012
The results of formative assessments are regularly used to inform important instructional decisions (e.g., targeted intervention) within a response to intervention (RTI) system of teaching and learning. The validity of such instructional decision-making depends, in part, on the alignment between formative measures and the academic content…
Descriptors: Elementary School Mathematics, Curriculum Based Assessment, Mathematics Tests, Academic Standards
Saven, Jessica L.; Irvin, P. Shawn; Park, Bitnara Jasmine; Alonzo, Julie; Anderson, Daniel; Tindal, Gerald – Behavioral Research and Teaching, 2012
The results of formative assessments are regularly used to inform important instructional decisions (e.g., targeted intervention) within a response to intervention (RTI) system of teaching and learning. The validity of such instructional decision-making depends, in part, on the alignment between formative measures and the academic content…
Descriptors: Elementary School Mathematics, Curriculum Based Assessment, Mathematics Tests, Academic Standards
Irvin, P. Shawn; Saven, Jessica L.; Alonzo, Julie; Park, Bitnara Jasmine; Anderson, Daniel; Tindal, Gerald – Behavioral Research and Teaching, 2012
The results of formative assessments are regularly used to inform important instructional decisions (e.g., targeted intervention) within a response to intervention (RTI) system of teaching and learning. The validity of such instructional decision-making depends, in part, on the alignment between formative measures and the academic content…
Descriptors: Elementary School Mathematics, Curriculum Based Assessment, Mathematics Tests, Academic Standards