Publication Date
In 2025 | 0 |
Since 2024 | 3 |
Since 2021 (last 5 years) | 5 |
Since 2016 (last 10 years) | 12 |
Since 2006 (last 20 years) | 19 |
Descriptor
Test Construction | 95 |
Test Interpretation | 95 |
Test Items | 95 |
Test Validity | 38 |
Test Reliability | 30 |
Item Analysis | 29 |
Achievement Tests | 22 |
Difficulty Level | 17 |
Elementary Secondary Education | 17 |
Higher Education | 14 |
Psychometrics | 14 |
More ▼ |
Source
Author
Publication Type
Education Level
Elementary Education | 5 |
Elementary Secondary Education | 5 |
Grade 4 | 3 |
Middle Schools | 3 |
Secondary Education | 3 |
Early Childhood Education | 2 |
Grade 8 | 2 |
Intermediate Grades | 2 |
Junior High Schools | 2 |
Grade 10 | 1 |
Grade 3 | 1 |
More ▼ |
Location
California | 2 |
United States | 2 |
Alabama | 1 |
Australia | 1 |
Brazil | 1 |
Canada | 1 |
Indiana | 1 |
Iran (Tehran) | 1 |
Japan | 1 |
Kansas | 1 |
Massachusetts | 1 |
More ▼ |
Laws, Policies, & Programs
Individuals with Disabilities… | 1 |
National Defense Education Act | 1 |
No Child Left Behind Act 2001 | 1 |
Assessments and Surveys
What Works Clearinghouse Rating
Baldwin, Peter; Clauser, Brian E. – Journal of Educational Measurement, 2022
While score comparability across test forms typically relies on common (or randomly equivalent) examinees or items, innovations in item formats, test delivery, and efforts to extend the range of score interpretation may require a special data collection before examinees or items can be used in this way--or may be incompatible with common examinee…
Descriptors: Scoring, Testing, Test Items, Test Format
Leventhal, Brian C.; Gregg, Nikole; Ames, Allison J. – Measurement: Interdisciplinary Research and Perspectives, 2022
Response styles introduce construct-irrelevant variance as a result of respondents systematically responding to Likert-type items regardless of content. Methods to account for response styles through data analysis as well as approaches to mitigating the effects of response styles during data collection have been well-documented. Recent approaches…
Descriptors: Response Style (Tests), Item Response Theory, Test Items, Likert Scales
Shadi Noroozi; Hossein Karami – Language Testing in Asia, 2024
Recently, psychometricians and researchers have voiced their concern over the exploration of language test items in light of Messick's validation framework. Validity has been central to test development and use; however, it has not received due attention in language tests having grave consequences for test takers. The present study sought to…
Descriptors: Foreign Countries, Doctoral Students, Graduate Students, Language Proficiency
Jessica B. Koslouski; Sandra M. Chafouleas; Amy Briesch; Jacqueline M. Caemmerer; Brittany Melo – School Mental Health, 2024
We are developing the Equitable Screening to Support Youth (ESSY) Whole Child Screener to address concerns prevalent in existing school-based screenings that impede goals to advance educational equity using universal screeners. Traditional assessment development does not include end users in the early development phases, instead relying on a…
Descriptors: Screening Tests, Psychometrics, Validity, Child Development
Jacobson, Erik; Svetina, Dubravka – Applied Measurement in Education, 2019
Contingent argument-based approaches to validity require a unique argument for each use, in contrast to more prescriptive approaches that identify the common kinds of validity evidence researchers should consider for every use. In this article, we evaluate our use of an approach that is both prescriptive "and" argument-based to develop a…
Descriptors: Test Validity, Test Items, Test Construction, Test Interpretation
Jessica B. Koslouski; Sandra M. Chafouleas; Amy Briesch; Jacqueline M. Caemmerer; Brittany Melo – Grantee Submission, 2024
We are developing the Equitable Screening to Support Youth (ESSY) Whole Child Screener to address concerns prevalent in existing school-based screenings that impede goals to advance educational equity using universal screeners. Traditional assessment development does not include end users in the early development phases, instead relying on a…
Descriptors: Screening Tests, Usability, Decision Making, Validity
Kirsch, Irwin; Lennon, Mary Louise – Large-scale Assessments in Education, 2017
As the largest and most innovative international assessment of adults, PIAAC marks an inflection point in the evolution of large-scale comparative assessments. PIAAC grew from the foundation laid by surveys that preceded it, and introduced innovations that have shifted the way we conceive and implement large-scale assessments. As the first fully…
Descriptors: International Assessment, Adults, Measurement, Surveys
Chiavaroli, Neville – Practical Assessment, Research & Evaluation, 2017
Despite the majority of MCQ writing guides discouraging the use of negatively-worded multiple choice questions (NWQs), they continue to be regularly used both in locally produced examinations and commercially available questions. There are several reasons why the use of NWQs may prove resistant to sound pedagogical advice. Nevertheless, systematic…
Descriptors: Multiple Choice Tests, Test Construction, Test Items, Validity
Michelle M. Neumann; Jason L. Anthony; Noé A. Erazo; David L. Neumann – Grantee Submission, 2019
The framework and tools used for classroom assessment can have significant impacts on teacher practices and student achievement. Getting assessment right is an important component in creating positive learning experiences and academic success. Recent government reports (e.g., United States, Australia) call for the development of systems that use…
Descriptors: Early Childhood Education, Futures (of Society), Educational Assessment, Evaluation Methods
Tengberg, Michael – Language Assessment Quarterly, 2018
Reading comprehension is often treated as a multidimensional construct. In many reading tests, items are distributed over reading process categories to represent the subskills expected to constitute comprehension. This study explores (a) the extent to which specified subskills of reading comprehension tests are conceptually conceivable to…
Descriptors: Reading Tests, Reading Comprehension, Scores, Test Results
Reynolds, Matthew R.; Niileksela, Christopher R. – Journal of Psychoeducational Assessment, 2015
"The Woodcock-Johnson IV Tests of Cognitive Abilities" (WJ IV COG) is an individually administered measure of psychometric intellectual abilities designed for ages 2 to 90+. The measure was published by Houghton Mifflin Harcourt-Riverside in 2014. Frederick Shrank, Kevin McGrew, and Nancy Mather are the authors. Richard Woodcock, the…
Descriptors: Cognitive Tests, Testing, Scoring, Test Interpretation
Chang, Wen-Chia Claire – ProQuest LLC, 2017
Preparing and supporting teachers to enact teaching practice that responds to diversity, challenges educational inequities, and promotes social justice is a pressing yet daunting and complex task. More research is needed to understand how and to what extent teacher education programs prepare and support teacher candidates to enhance the…
Descriptors: Test Construction, Educational Practices, Equal Education, Item Response Theory
Traynor, Anne – Educational Assessment, 2017
Variation in test performance among examinees from different regions or national jurisdictions is often partially attributed to differences in the degree of content correspondence between local school or training program curricula, and the test of interest. This posited relationship between test-curriculum correspondence, or "alignment,"…
Descriptors: Test Items, Test Construction, Alignment (Education), Curriculum
Schneider, M. Christina; Huff, Kristen L.; Egan, Karla L.; Gaines, Margie L.; Ferrara, Steve – Educational Assessment, 2013
A primary goal of standards-based statewide achievement tests is to classify students into achievement levels that enable valid inferences about student content area knowledge and skill. Explicating how knowledge and skills are expected to differ in complexity in achievement level descriptors, and how that complexity is related to empirical item…
Descriptors: Test Items, Difficulty Level, Achievement Tests, Test Interpretation
Lee, Eunjung; Lee, Won-Chan; Brennan, Robert L. – College Board, 2012
In almost all high-stakes testing programs, test equating is necessary to ensure that test scores across multiple test administrations are equivalent and can be used interchangeably. Test equating becomes even more challenging in mixed-format tests, such as Advanced Placement Program® (AP®) Exams, that contain both multiple-choice and constructed…
Descriptors: Test Construction, Test Interpretation, Test Norms, Test Reliability