Publication Date
In 2025 | 1 |
Since 2024 | 2 |
Since 2021 (last 5 years) | 2 |
Since 2016 (last 10 years) | 6 |
Since 2006 (last 20 years) | 8 |
Descriptor
Test Reliability | 31 |
Test Use | 31 |
Testing | 31 |
Test Validity | 22 |
Test Construction | 13 |
Test Bias | 9 |
Evaluation Methods | 8 |
Elementary Secondary Education | 7 |
Student Evaluation | 7 |
Test Format | 7 |
Foreign Countries | 6 |
More ▼ |
Source
Author
Baker, Eva L. | 2 |
Alderson, J. Charles | 1 |
Amery D. Wu | 1 |
Amit Sevak | 1 |
Boyle, J. David | 1 |
Daniel Fishtein | 1 |
Eignor, Daniel R. | 1 |
Fields, Joyce I. | 1 |
Green, Kathy E. | 1 |
Hall, William | 1 |
Hambleton, Ronald K. | 1 |
More ▼ |
Publication Type
Education Level
Elementary Education | 4 |
Early Childhood Education | 3 |
Grade 3 | 3 |
Grade 4 | 3 |
Grade 5 | 3 |
Grade 6 | 3 |
Grade 7 | 3 |
Grade 8 | 3 |
Intermediate Grades | 3 |
Junior High Schools | 3 |
Middle Schools | 3 |
More ▼ |
Audience
Practitioners | 5 |
Teachers | 2 |
Researchers | 1 |
Students | 1 |
Laws, Policies, & Programs
Every Student Succeeds Act… | 1 |
Individuals with Disabilities… | 1 |
No Child Left Behind Act 2001 | 1 |
Assessments and Surveys
Bayley Scales of Infant… | 1 |
Measures of Academic Progress | 1 |
National Assessment of… | 1 |
Peabody Picture Vocabulary… | 1 |
Test of Adult Basic Education | 1 |
Wide Range Achievement Test | 1 |
Woodcock Johnson Tests of… | 1 |
What Works Clearinghouse Rating
Shun-Fu Hu; Amery D. Wu; Jake Stone – Journal of Educational Measurement, 2025
Scoring high-dimensional assessments (e.g., > 15 traits) can be a challenging task. This paper introduces the multilabel neural network (MNN) as a scoring method for high-dimensional assessments. Additionally, it demonstrates how MNN can score the same test responses to maximize different performance metrics, such as accuracy, recall, or…
Descriptors: Tests, Testing, Scores, Test Construction
Patrick Kyllonen; Amit Sevak; Teresa Ober; Ikkyu Choi; Jesse Sparks; Daniel Fishtein – ETS Research Report Series, 2024
Assessment refers to a broad array of approaches for measuring or evaluating a person's (or group of persons') skills, behaviors, dispositions, or other attributes. Assessments range from standardized tests used in admissions, employee selection, licensure examinations, and domestic and international large-scale assessments of cognitive and…
Descriptors: Assessment Literacy, Testing, Test Bias, Test Construction
Sanders, Sara – National Technical Assistance Center for the Education of Neglected or Delinquent Children and Youth (NDTAC), 2019
This guide is designed to assist States, agencies, and/or facilities who work with youth who are neglected, delinquent, or at-risk (N or D). The information in the guide will benefit those who are (a) interested in implementing pre-posttests, (b) in the process of identifying an appropriate pre-posttest, or (c) ready to evaluate current testing…
Descriptors: At Risk Students, Delinquency, Pretests Posttests, Testing
New York State Education Department, 2018
This technical report provides detailed information regarding the technical, statistical, and measurement attributes of the New York State Testing Program (NYSTP) for the Grades 3-8 English Language Arts (ELA) and Mathematics 2018 Operational Tests. This report includes information about test content and test development, item (i.e., individual…
Descriptors: English, Language Arts, Language Tests, Mathematics Tests
International Journal of Testing, 2019
These guidelines describe considerations relevant to the assessment of test takers in or across countries or regions that are linguistically or culturally diverse. The guidelines were developed by a committee of experts to help inform test developers, psychometricians, test users, and test administrators about fairness issues in support of the…
Descriptors: Test Bias, Student Diversity, Cultural Differences, Language Usage
New York State Education Department, 2017
This technical report provides detailed information regarding the technical, statistical, and measurement attributes of the New York State Testing Program (NYSTP) for the Grades 3-8 English Language Arts (ELA) and Mathematics 2017 Operational Tests. This report includes information about test content and test development, item (i.e., individual…
Descriptors: English, Language Arts, Language Tests, Mathematics Tests
Hassan, Nurul Huda; Shih, Chih-Min – Language Assessment Quarterly, 2013
This article describes and reviews the Singapore-Cambridge General Certificate of Education Advanced Level General Paper (GP) examination. As a written test that is administered to preuniversity students, the GP examination is internationally recognised and accepted by universities and employers as proof of English competence. In this article, the…
Descriptors: Foreign Countries, College Entrance Examinations, English (Second Language), Writing Tests
New York State Education Department, 2014
This technical report provides an overview of the New York State Alternate Assessment (NYSAA), including a description of the purpose of the NYSAA, the processes utilized to develop and implement the NYSAA program, and Stakeholder involvement in those processes. The purpose of this report is to document the technical aspects of the 2013-14 NYSAA.…
Descriptors: Alternative Assessment, Educational Assessment, State Departments of Education, Student Evaluation

Pearson, Judith E. – Measurement and Evaluation in Counseling and Development, 1987
Introduces the Interpersonal Network Questionnaire (INQ), an instrument designed to measure constructs of social networks such as social participation, confidant size, and frequency of contact. Discusses item generation, reliability, administration, and applications of the INQ. (Author/NB)
Descriptors: Evaluation, Social Networks, Social Support Groups, Test Reliability

Tillinghast, B. S., Jr.; And Others – Journal of Educational Research, 1983
A study using the Peabody Picture Vocabulary Test (Revised) was conducted to determine whether the increase in reliability when both Forms L and M were employed justified the increase in time required for the longer procedure. Children in grades four, five, and six were involved in the project. (PP)
Descriptors: Intermediate Grades, Test Reliability, Test Results, Test Use
Lehr, Camilla A.; And Others – 1986
Information about current assessment practices was obtained from 54 surveys completed by Handicapped Children's Early Education Program (HCEEP) demonstration projects across the United States. Information about factors influencing the selection and continued use of tests also was provided. Results indicated that 19 tests were used by five or more…
Descriptors: Demonstration Programs, Disabilities, National Surveys, Preschool Education
Reuter, Jeanette; And Others – 1982
Of the 15 substantive papers in this report, 12 focus on the use of the Kent Infant Development (KID) Scale with severely handicapped children. The KID Scale measures 252 behaviors usually developed during the first year of life in five domains (cognitive, motor, language, self-help, and social). It was successfully adapted to elicit reliable…
Descriptors: Infants, Severe Disabilities, Student Evaluation, Test Reliability
Baker, Eva L. – 1982
This booklet is intended to help school personnel, parents, students, and members of the community understand concepts and research relating to achievement testing in public schools. The paper's sections include: (1) test use with direct effects on students (test of certification, selection, and placement); (2) test use with indirect effects on…
Descriptors: Achievement Tests, Criterion Referenced Tests, Elementary Secondary Education, Glossaries

Fields, Joyce I. – Early Child Development and Care, 1997
Evaluated five intelligence test instruments for use with Malaysian children: Raven's Sijil Pelajaran Malaysia (SPM), WISC-R, School Failure Tolerance (SFT), Scale for Rating Behavior Characteristics of Superior Students (SRBCSS), and Parent Checklists. Found that Raven's SPM was an effective screening test, and the WISC-R the best measure to…
Descriptors: Academically Gifted, Foreign Countries, Gifted, Intelligence Tests

Woodburn, Jim; Sutcliffe, Nick – Assessment & Evaluation in Higher Education, 1996
The Objective Structured Clinical Examination (OSCE), initially developed for undergraduate medical education, has been adapted for assessment of clinical skills in podiatry students. A 12-month pilot study found the test had relatively low levels of reliability, high construct and criterion validity, and good stability of performance over time.…
Descriptors: Clinical Teaching (Health Professions), Higher Education, Medical Education, Podiatry