Publication Date
In 2025 | 1 |
Since 2024 | 2 |
Since 2021 (last 5 years) | 2 |
Since 2016 (last 10 years) | 9 |
Since 2006 (last 20 years) | 17 |
Descriptor
Test Use | 55 |
Test Validity | 55 |
Testing | 55 |
Test Reliability | 22 |
Test Construction | 17 |
Evaluation Methods | 12 |
Test Bias | 12 |
Elementary Secondary Education | 11 |
Scores | 11 |
Student Evaluation | 11 |
Foreign Countries | 9 |
More ▼ |
Source
Author
Baker, Eva L. | 2 |
Aikenhead, Glen S. | 1 |
Alderson, J. Charles | 1 |
Amery D. Wu | 1 |
Amit Sevak | 1 |
Bachman, Lyle F. | 1 |
Bennett, Randy Elliot | 1 |
Boyle, J. David | 1 |
Camara, Wayne J. | 1 |
Campbell, Vicki L. | 1 |
Cancelli, Anthony A. | 1 |
More ▼ |
Publication Type
Education Level
Elementary Education | 4 |
Early Childhood Education | 3 |
Grade 3 | 3 |
Grade 4 | 3 |
Grade 5 | 3 |
Grade 6 | 3 |
Grade 7 | 3 |
Grade 8 | 3 |
Higher Education | 3 |
Intermediate Grades | 3 |
Junior High Schools | 3 |
More ▼ |
Location
New York | 4 |
Asia | 1 |
Brazil | 1 |
Canada | 1 |
China | 1 |
Japan | 1 |
Kenya | 1 |
Malaysia | 1 |
Nebraska | 1 |
New Jersey | 1 |
Pennsylvania | 1 |
More ▼ |
Laws, Policies, & Programs
No Child Left Behind Act 2001 | 2 |
Elementary and Secondary… | 1 |
Every Student Succeeds Act… | 1 |
Individuals with Disabilities… | 1 |
Assessments and Surveys
What Works Clearinghouse Rating
Shun-Fu Hu; Amery D. Wu; Jake Stone – Journal of Educational Measurement, 2025
Scoring high-dimensional assessments (e.g., > 15 traits) can be a challenging task. This paper introduces the multilabel neural network (MNN) as a scoring method for high-dimensional assessments. Additionally, it demonstrates how MNN can score the same test responses to maximize different performance metrics, such as accuracy, recall, or…
Descriptors: Tests, Testing, Scores, Test Construction
Patrick Kyllonen; Amit Sevak; Teresa Ober; Ikkyu Choi; Jesse Sparks; Daniel Fishtein – ETS Research Report Series, 2024
Assessment refers to a broad array of approaches for measuring or evaluating a person's (or group of persons') skills, behaviors, dispositions, or other attributes. Assessments range from standardized tests used in admissions, employee selection, licensure examinations, and domestic and international large-scale assessments of cognitive and…
Descriptors: Assessment Literacy, Testing, Test Bias, Test Construction
Haertel, Edward H. – Educational Psychologist, 2018
In the service of educational accountability, student achievement tests are being used to measure constructs quite unlike those envisioned by test developers. Scores are compared to cut points to create classifications like "proficient"; scores are combined over time to measure growth; student scores are aggregated to measure the…
Descriptors: Achievement Tests, Scores, Test Validity, Test Interpretation
Schmitz, Florian; Wilhelm, Oliver – Journal of Intelligence, 2019
Current taxonomies of intelligence comprise two factors of mental speed, clerical speed (Gs), and elementary cognitive speed (Gt). Both originated from different research traditions and are conceptualized as dissociable constructs in current taxonomies. However, previous research suggests that tasks of one category can be transferred into the…
Descriptors: Taxonomy, Intelligence Tests, Testing, Test Format
Schmidgall, Jonathan E.; Getman, Edward P.; Zu, Jiyun – Language Testing, 2018
In this study, we define the term "screener test," elaborate key considerations in test design, and describe how to incorporate the concepts of practicality and argument-based validation to drive an evaluation of screener tests for language assessment. A screener test is defined as a brief assessment designed to identify an examinee as a…
Descriptors: Test Validity, Test Use, Test Construction, Language Tests
Sanders, Sara – National Technical Assistance Center for the Education of Neglected or Delinquent Children and Youth (NDTAC), 2019
This guide is designed to assist States, agencies, and/or facilities who work with youth who are neglected, delinquent, or at-risk (N or D). The information in the guide will benefit those who are (a) interested in implementing pre-posttests, (b) in the process of identifying an appropriate pre-posttest, or (c) ready to evaluate current testing…
Descriptors: At Risk Students, Delinquency, Pretests Posttests, Testing
New York State Education Department, 2018
This technical report provides detailed information regarding the technical, statistical, and measurement attributes of the New York State Testing Program (NYSTP) for the Grades 3-8 English Language Arts (ELA) and Mathematics 2018 Operational Tests. This report includes information about test content and test development, item (i.e., individual…
Descriptors: English, Language Arts, Language Tests, Mathematics Tests
Kane, Michael – Measurement: Interdisciplinary Research and Perspectives, 2012
Paul E. Newton's article on the consensus definition of validity tackles a number of big issues and makes a number of strong claims. I agreed with much of what he said, and I disagreed with a number of his claims, but I found his article to be consistently interesting and thought provoking (whether I agreed or not). I will focus on three general…
Descriptors: Validity, Construct Validity, Tests, Testing
Mattern, Krista D.; Kobrin, Jennifer L.; Camara, Wayne J. – Measurement: Interdisciplinary Research and Perspectives, 2012
As researchers at a testing organization concerned with the appropriate uses and validity evidence for our assessments, we provide an applied perspective related to the issues raised in the focus article. Newton's proposal for elaborating the consensus definition of validity is offered with the intention to reduce the risks of inadequate…
Descriptors: Evidence, Validity, Tests, Testing
International Journal of Testing, 2019
These guidelines describe considerations relevant to the assessment of test takers in or across countries or regions that are linguistically or culturally diverse. The guidelines were developed by a committee of experts to help inform test developers, psychometricians, test users, and test administrators about fairness issues in support of the…
Descriptors: Test Bias, Student Diversity, Cultural Differences, Language Usage
New York State Education Department, 2017
This technical report provides detailed information regarding the technical, statistical, and measurement attributes of the New York State Testing Program (NYSTP) for the Grades 3-8 English Language Arts (ELA) and Mathematics 2017 Operational Tests. This report includes information about test content and test development, item (i.e., individual…
Descriptors: English, Language Arts, Language Tests, Mathematics Tests
Magee, Robert G.; Jones, Brett D. – Australian Journal of Educational & Developmental Psychology, 2012
This article describes the development of an instrument to assess beliefs about standardized testing in schools, a topic of much heated debate. The Beliefs About Standardized Testing scale was developed to measure the extent to which individuals support high-stakes standardized testing. The 9-item scale comprises three subscales which measure…
Descriptors: Testing, Measures (Individuals), Standardized Tests, Epistemology
Lane, Suzanne – Measurement: Interdisciplinary Research and Perspectives, 2012
Considering consequences in the evaluation of validity is not new although it is still debated by Paul E. Newton and others. The argument-based approach to validity entails an interpretative argument that explicitly identifies the proposed interpretations and uses of test scores and a validity argument that provides a structure for evaluating the…
Descriptors: Educational Opportunities, Accountability, Validity, Inferences
Hassan, Nurul Huda; Shih, Chih-Min – Language Assessment Quarterly, 2013
This article describes and reviews the Singapore-Cambridge General Certificate of Education Advanced Level General Paper (GP) examination. As a written test that is administered to preuniversity students, the GP examination is internationally recognised and accepted by universities and employers as proof of English competence. In this article, the…
Descriptors: Foreign Countries, College Entrance Examinations, English (Second Language), Writing Tests
Cheng, Liying; DeLuca, Christopher – Educational Assessment, 2011
Test-takers' interpretations of validity as related to test constructs and test use have been widely debated in large-scale language assessment. This study contributes further evidence to this debate by examining 59 test-takers' perspectives in writing large-scale English language tests. Participants wrote about their test-taking experiences in…
Descriptors: Language Tests, Test Validity, Test Use, English