Publication Date
In 2025 | 1 |
Since 2024 | 2 |
Since 2021 (last 5 years) | 3 |
Since 2016 (last 10 years) | 5 |
Since 2006 (last 20 years) | 12 |
Descriptor
Evaluation Methods | 138 |
Test Construction | 138 |
Test Use | 138 |
Educational Assessment | 54 |
Student Evaluation | 50 |
Elementary Secondary Education | 42 |
Test Validity | 40 |
Performance Based Assessment | 27 |
Test Reliability | 26 |
Higher Education | 20 |
Psychometrics | 18 |
More ▼ |
Source
Author
Danielson, Charlotte | 3 |
Clark, John L. D. | 2 |
Ediger, Marlow | 2 |
Eignor, Daniel R. | 2 |
Johnson, Bil | 2 |
Linn, Robert L. | 2 |
Mott, Michael S. | 2 |
Nichols, Paul D. | 2 |
Abedi, Jamal | 1 |
Aiken, Lewis R. | 1 |
Amery D. Wu | 1 |
More ▼ |
Publication Type
Education Level
Elementary Secondary Education | 6 |
Elementary Education | 3 |
Adult Basic Education | 1 |
Adult Education | 1 |
Grade 6 | 1 |
Higher Education | 1 |
Postsecondary Education | 1 |
Secondary Education | 1 |
Audience
Practitioners | 24 |
Teachers | 19 |
Administrators | 1 |
Researchers | 1 |
Students | 1 |
Location
United Kingdom | 3 |
Australia | 2 |
California | 2 |
Canada | 2 |
Netherlands | 2 |
United Kingdom (England) | 2 |
United States | 2 |
Georgia | 1 |
Israel | 1 |
North Carolina | 1 |
South Korea | 1 |
More ▼ |
Laws, Policies, & Programs
Every Student Succeeds Act… | 1 |
No Child Left Behind Act 2001 | 1 |
Assessments and Surveys
What Works Clearinghouse Rating
Daniel Koretz – Journal of Educational and Behavioral Statistics, 2024
A critically important balance in educational measurement between practical concerns and matters of technique has atrophied in recent decades, and as a result, some important issues in the field have not been adequately addressed. I start with the work of E. F. Lindquist, who exemplified the balance that is now wanting. Lindquist was arguably the…
Descriptors: Educational Assessment, Evaluation Methods, Achievement Tests, Educational History
Shun-Fu Hu; Amery D. Wu; Jake Stone – Journal of Educational Measurement, 2025
Scoring high-dimensional assessments (e.g., > 15 traits) can be a challenging task. This paper introduces the multilabel neural network (MNN) as a scoring method for high-dimensional assessments. Additionally, it demonstrates how MNN can score the same test responses to maximize different performance metrics, such as accuracy, recall, or…
Descriptors: Tests, Testing, Scores, Test Construction
Ng, Zi Jia; Willner, Cynthia J.; Mannweiler, Morgan D.; Hoffmann, Jessica D.; Bailey, Craig S.; Cipriano, Christina – Educational Psychology Review, 2022
Many emotion regulation assessments have been developed for research purposes, but few are frequently used in schools despite the rapid growth of social and emotional learning programs with an explicit focus on emotion regulation in schools. This systematic review provides an overview of emotion regulation assessments that have been utilized with…
Descriptors: Emotional Response, Self Control, Elementary School Students, Secondary School Students
Sanders, Sara – National Technical Assistance Center for the Education of Neglected or Delinquent Children and Youth (NDTAC), 2019
This guide is designed to assist States, agencies, and/or facilities who work with youth who are neglected, delinquent, or at-risk (N or D). The information in the guide will benefit those who are (a) interested in implementing pre-posttests, (b) in the process of identifying an appropriate pre-posttest, or (c) ready to evaluate current testing…
Descriptors: At Risk Students, Delinquency, Pretests Posttests, Testing
International Journal of Testing, 2019
These guidelines describe considerations relevant to the assessment of test takers in or across countries or regions that are linguistically or culturally diverse. The guidelines were developed by a committee of experts to help inform test developers, psychometricians, test users, and test administrators about fairness issues in support of the…
Descriptors: Test Bias, Student Diversity, Cultural Differences, Language Usage
American Educational Research Association (AERA), 2014
Developed jointly by the American Educational Research Association, American Psychological Association, and the National Council on Measurement in Education, "Standards for Educational and Psychological Testing" (Revised 2014) addresses professional and technical issues of test development and use in education, psychology, and…
Descriptors: Standards, Educational Testing, Psychological Testing, Test Construction
Walker, Michael E. – Measurement: Interdisciplinary Research and Perspectives, 2010
"Linking" is a term given to a general class of procedures by which one represents scores X on one test or measure in terms of scores Y on another test or measure. A recent taxonomy by Holland and Dorans (2006; Holland, 2007) organizes the various types of links into three broad categories: prediction, scale aligning, and equating. In…
Descriptors: Foreign Countries, Test Construction, Test Validity, Measurement Techniques
von Davier, Alina A. – Measurement: Interdisciplinary Research and Perspectives, 2010
The article "Thinking About Linking" by Newton (2010) presents a novel philosophical perspective on the way that educational assessments should be linked. Newton starts by describing the linking framework as it was characterized in various publications and identifies a cross-cultural dimension in the definitions and uses of test…
Descriptors: Foreign Countries, Educational Assessment, Student Evaluation, Evaluation Criteria
Nichols, Paul D.; Meyers, Jason L.; Burling, Kelly S. – Educational Measurement: Issues and Practice, 2009
Assessments labeled as formative have been offered as a means to improve student achievement. But labels can be a powerful way to miscommunicate. For an assessment use to be appropriately labeled "formative," both empirical evidence and reasoned arguments must be offered to support the claim that improvements in student achievement can be linked…
Descriptors: Academic Achievement, Tutoring, Student Evaluation, Evaluation Methods
Herman, Joan L.; Osmundson, Ellen; Dietel, Ronald – Assessment and Accountability Comprehensive Center, 2010
This report describes the purposes of benchmark assessments and provides recommendations for selecting and using benchmark assessments--addressing validity, alignment, reliability, fairness and bias and accessibility, instructional sensitivity, utility, and reporting issues. We also present recommendations on building capacity to support schools'…
Descriptors: Multiple Choice Tests, Test Items, Benchmarking, Educational Assessment

Arthur, Nancy M. – Journal of Counseling and Development, 1990
Reviews three self-report inventories designed to respond to syndrome of burnout in helping professionals: Maslach Burnout Inventory, Staff Burnout Scale for Health Professionals; and Tedium Scale. Describes each instrument, its development, and related research. Provides recommendations for future research. Discusses suggestions for use of the…
Descriptors: Burnout, Evaluation Methods, Test Construction, Test Use

Yen, Wendy M. – Educational Measurement: Issues and Practice, 1998
The articles in this issue, written from the perspectives of academics, practitioners, and publishers, show that examining the consequences of assessment is an important, large, and difficult task. Collaborative action by assessment developers, users, and the educational measurement community is needed if progress is to be made. (SLD)
Descriptors: Cooperation, Evaluation Methods, Program Evaluation, Responsibility
Thurlow, Martha L.; Laitusis, Cara Cahalan; Dillon, Deborah R.; Cook, Linda L.; Moen, Ross E.; Abedi, Jamal; O'Brien, David G. – National Accessible Reading Assessment Projects, 2009
Within the context of standards-based educational systems, states are using large scale reading assessments to help ensure that all children have the opportunity to learn essential knowledge and skills. The challenge for developers of accessible reading assessments is to develop assessments that measure only those student characteristics that are…
Descriptors: Reading Achievement, Measures (Individuals), Student Characteristics, Disabilities

Moss, Pamela A. – Educational Measurement: Issues and Practice, 1998
Provides an argument for incorporating consideration of consequences into validity theory that is grounded in the reflexive nature of social knowledge. It also calls for the consideration of evidence of validity based on the actual discourse surrounding the practices and products of testing. (SLD)
Descriptors: Evaluation Methods, Evaluation Utilization, Program Evaluation, Test Construction
Humphries-Wadsworth, Terresa M. – 1998
The American Psychological Association, in the late 1940s, began work to establish a code of ethics to include and address the needs of members in scientific and applied fields. Out of the ethics work emerged a set of standards for evaluating psychological tests. Four categories, or types of validity, were identified: content, predictive,…
Descriptors: Codes of Ethics, Definitions, Evaluation Methods, Psychological Testing