Publication Date
In 2025 | 1 |
Since 2024 | 2 |
Since 2021 (last 5 years) | 3 |
Since 2016 (last 10 years) | 12 |
Since 2006 (last 20 years) | 17 |
Descriptor
Source
Educational Measurement:… | 43 |
Author
Moss, Pamela A. | 2 |
Nichols, Paul D. | 2 |
Rudner, Lawrence M. | 2 |
Abedi, Jamal | 1 |
Angela Johnson | 1 |
Arslan, Burcu | 1 |
Beller, Michal | 1 |
Bond, Lloyd | 1 |
Bottsford-Miller, Nicole A. | 1 |
Burling, Kelly S. | 1 |
Capie, William | 1 |
More ▼ |
Publication Type
Journal Articles | 43 |
Reports - Descriptive | 14 |
Reports - Research | 12 |
Reports - Evaluative | 11 |
Opinion Papers | 10 |
Information Analyses | 6 |
Speeches/Meeting Papers | 1 |
Education Level
Elementary Secondary Education | 1 |
Grade 9 | 1 |
High Schools | 1 |
Higher Education | 1 |
Junior High Schools | 1 |
Middle Schools | 1 |
Postsecondary Education | 1 |
Secondary Education | 1 |
Audience
Location
Israel | 1 |
New York (New York) | 1 |
Texas | 1 |
Laws, Policies, & Programs
Every Student Succeeds Act… | 1 |
Assessments and Surveys
ACT Assessment | 1 |
National Assessment of… | 1 |
SAT (College Admission Test) | 1 |
Stanford Achievement Tests | 1 |
Teacher Performance… | 1 |
Watson Glaser Critical… | 1 |
What Works Clearinghouse Rating
Guher Gorgun; Okan Bulut – Educational Measurement: Issues and Practice, 2025
Automatic item generation may supply many items instantly and efficiently to assessment and learning environments. Yet, the evaluation of item quality persists to be a bottleneck for deploying generated items in learning and assessment settings. In this study, we investigated the utility of using large-language models, specifically Llama 3-8B, for…
Descriptors: Artificial Intelligence, Quality Control, Technology Uses in Education, Automation
Student, Sanford R.; Gong, Brian – Educational Measurement: Issues and Practice, 2022
We address two persistent challenges in large-scale assessments of the Next Generation Science Standards: (a) the validity of score interpretations that target the standards broadly and (b) how to structure claims for assessments of this complex domain. The NGSS pose a particular challenge for specifying claims about students that evidence from…
Descriptors: Science Tests, Test Validity, Test Items, Test Construction
Angela Johnson; Elizabeth Barker; Marcos Viveros Cespedes – Educational Measurement: Issues and Practice, 2024
Educators and researchers strive to build policies and practices on data and evidence, especially on academic achievement scores. When assessment scores are inaccurate for specific student populations or when scores are inappropriately used, even data-driven decisions will be misinformed. To maximize the impact of the research-practice-policy…
Descriptors: Equal Education, Inclusion, Evaluation Methods, Error of Measurement
Wilkerson, Judy R. – Educational Measurement: Issues and Practice, 2020
Validity and reliability are a major focus in teacher education accreditation by the Council for Accreditation of Educator Preparation (CAEP). CAEP requires the use of "accepted research standards," but many faculty and administrators are unsure how to meet this requirement. The Standards of Educational and Psychological Testing…
Descriptors: Test Construction, Test Validity, Test Reliability, Teacher Education Programs
Arslan, Burcu; Jiang, Yang; Keehner, Madeleine; Gong, Tao; Katz, Irvin R.; Yan, Fred – Educational Measurement: Issues and Practice, 2020
Computer-based educational assessments often include items that involve drag-and-drop responses. There are different ways that drag-and-drop items can be laid out and different choices that test developers can make when designing these items. Currently, these decisions are based on experts' professional judgments and design constraints, rather…
Descriptors: Test Items, Computer Assisted Testing, Test Format, Decision Making
Sijtsma, Klaas – Educational Measurement: Issues and Practice, 2015
I discuss the contribution by Davenport, Davison, Liou, & Love (2015) in which they relate reliability represented by coefficient a to formal definitions of internal consistency and unidimensionality, both proposed by Cronbach (1951). I argue that coefficient a is a lower bound to reliability and that concepts of internal consistency and…
Descriptors: Reliability, Mathematics, Validity, Test Construction
Mislevy, Robert J.; Oliveri, Maria Elena – Educational Measurement: Issues and Practice, 2019
In this digital ITEMS module, Dr. Robert [Bob] Mislevy and Dr. Maria Elena Oliveri introduce and illustrate a sociocognitive perspective on educational measurement, which focuses on a variety of design and implementation considerations for creating fair and valid assessments for learners from diverse populations with diverse sociocultural…
Descriptors: Educational Testing, Reliability, Test Validity, Test Reliability
Welch, Catherine J.; Dunbar, Stephen B. – Educational Measurement: Issues and Practice, 2020
The use of assessment results to inform school accountability relies on the assumption that the test design appropriately represents the content and cognitive emphasis reflected in the state's standards. Since the passage of the Every Student Succeeds Act and the certification of accountability assessments through federal peer review practices,…
Descriptors: Accountability, Test Construction, State Standards, Content Validity
Embretson, Susan E. – Educational Measurement: Issues and Practice, 2016
Examinees' thinking processes have become an increasingly important concern in testing. The responses processes aspect is a major component of validity, and contemporary tests increasingly involve specifications about the cognitive complexity of examinees' response processes. Yet, empirical research findings on examinees' cognitive processes are…
Descriptors: Testing, Cognitive Processes, Test Construction, Test Items
Davenport, Ernest C.; Davison, Mark L.; Liou, Pey-Yan; Love, Quintin U. – Educational Measurement: Issues and Practice, 2016
The main points of Sijtsma and Green and Yang in Educational Measurement: Issues and Practice (34, 4) are that reliability, internal consistency, and unidimensionality are distinct and that Cronbach's alpha may be problematic. Neither of these assertions are at odds with Davenport, Davison, Liou, and Love in the same issue. However, many authors…
Descriptors: Educational Assessment, Reliability, Validity, Test Construction
Abedi, Jamal; Zhang, Yu; Rowe, Susan E.; Lee, Hansol – Educational Measurement: Issues and Practice, 2020
Research indicates that the performance-gap between English Language Learners (ELLs) and their non-ELL peers is partly due to ELLs' difficulty in understanding assessment language. Accommodations have been shown to narrow this performance-gap, but many accommodations studies have not used a randomized design and are based on relatively small…
Descriptors: English Language Learners, Achievement Gap, Mathematics Tests, Standards
Johnson, Evelyn S.; Crawford, Angela; Moylan, Laura A.; Zheng, Yuzhu – Educational Measurement: Issues and Practice, 2018
The evidence-centered design framework was used to create a special education teacher observation system, Recognizing Effective Special Education Teachers. Extensive reviews of research informed the domain analysis and modeling stages, and led to the conceptual framework in which effective special education teaching is operationalized as the…
Descriptors: Evidence Based Practice, Special Education Teachers, Observation, Disabilities
Gierl, Mark J.; Lai, Hollis – Educational Measurement: Issues and Practice, 2016
Testing organization needs large numbers of high-quality items due to the proliferation of alternative test administration methods and modern test designs. But the current demand for items far exceeds the supply. Test items, as they are currently written, evoke a process that is both time-consuming and expensive because each item is written,…
Descriptors: Test Items, Test Construction, Psychometrics, Models
Nichols, Paul D.; Williams, Natasha – Educational Measurement: Issues and Practice, 2009
This article has three goals. The first goal is to clarify the role that the consequences of test score use play in validity judgments by reviewing the role that modern writers on validity have ascribed for consequences in supporting validity judgments. The second goal is to summarize current views on who is responsible for collecting evidence of…
Descriptors: Tests, Test Validity, Scores, Data Collection
Johnstone, Christopher J.; Thompson, Sandra J.; Bottsford-Miller, Nicole A.; Thurlow, Martha L. – Educational Measurement: Issues and Practice, 2008
Test items undergo multiple iterations of review before states and vendors deem them acceptable to be placed in a live statewide assessment. This article reviews three approaches that can add validity evidence to states' item review processes. The first process is a structured sensitivity review process that focuses on universal design…
Descriptors: Test Items, Disabilities, Test Construction, Testing Programs