Publication Date
In 2025 | 1 |
Since 2024 | 1 |
Since 2021 (last 5 years) | 3 |
Since 2016 (last 10 years) | 6 |
Since 2006 (last 20 years) | 9 |
Descriptor
Test Items | 71 |
Test Validity | 71 |
Testing Problems | 71 |
Test Construction | 33 |
Test Reliability | 26 |
Item Analysis | 23 |
Achievement Tests | 16 |
Test Bias | 16 |
Elementary Secondary Education | 14 |
Difficulty Level | 12 |
Higher Education | 12 |
More ▼ |
Source
Author
Diamond, Esther E. | 2 |
Harnisch, Delwyn L. | 2 |
Secolsky, Charles | 2 |
Andrada, Gilbert N. | 1 |
Autman, Hamlet | 1 |
Bao, Lei | 1 |
Barlow, Lisa | 1 |
Benderson, Albert, Ed. | 1 |
Benson, Jeri | 1 |
Bond, Lloyd | 1 |
Bower, Ruth | 1 |
More ▼ |
Publication Type
Education Level
Higher Education | 3 |
Postsecondary Education | 2 |
Adult Education | 1 |
Elementary Secondary Education | 1 |
Secondary Education | 1 |
Audience
Researchers | 11 |
Practitioners | 6 |
Teachers | 3 |
Students | 1 |
Location
Netherlands | 2 |
Arizona | 1 |
Canada | 1 |
China | 1 |
Latin America | 1 |
New Jersey | 1 |
United Arab Emirates | 1 |
United Kingdom | 1 |
Laws, Policies, & Programs
Assessments and Surveys
What Works Clearinghouse Rating
Chen, Yunxiao; Lee, Yi-Hsuan; Li, Xiaoou – Journal of Educational and Behavioral Statistics, 2022
In standardized educational testing, test items are reused in multiple test administrations. To ensure the validity of test scores, the psychometric properties of items should remain unchanged over time. In this article, we consider the sequential monitoring of test items, in particular, the detection of abrupt changes to their psychometric…
Descriptors: Standardized Tests, Test Items, Test Validity, Scores
Yi Zou; Ying Zheng; Jingwen Wang – International Journal of Language Testing, 2025
The Pearson Test of English Academic (PTE-A), a widely used high-stakes language proficiency test for university admissions and migration purposes, underwent a notable change from a three-hour to a two-hour version in November 2021. The implementation of the new version has prompted inquiries into the washback effects on various stakeholders.…
Descriptors: Testing Problems, Test Preparation, High Stakes Tests, English (Second Language)
Brocato, Nicole; Hix, Laura; Jayawickreme, Eranda – Journal of Moral Education, 2020
University settings present a unique opportunity for young adults to develop characteristics constitutive of wisdom. One challenge for educators working to support this development involves effectively measuring these characteristics. In this article, we present results from a secondary analysis of cognitive interviews to examine challenges that…
Descriptors: Undergraduate Students, Young Adults, Personality, Individual Characteristics
Rivas, Axel; Scasso, Martín Guillermo – Journal of Education Policy, 2021
Since 2000, the PISA test implemented by OECD has become the prime benchmark for international comparisons in education. The 2015 PISA edition introduced methodological changes that altered the nature of its results. PISA made no longer valid non-reached items of the final part of the test, assuming that those unanswered questions were more a…
Descriptors: Test Validity, Computer Assisted Testing, Foreign Countries, Achievement Tests
Bao, Lei; Xiao, Yang; Koenig, Kathleen; Han, Jing – Physical Review Physics Education Research, 2018
In science, technology, engineering, and mathematics education there has been increased emphasis on teaching goals that include not only the learning of content knowledge but also the development of scientific reasoning skills. The Lawson classroom test of scientific reasoning (LCTSR) is a popular assessment instrument for scientific reasoning.…
Descriptors: Science Tests, Science Process Skills, Logical Thinking, Test Validity
Autman, Hamlet; Kelly, Stephanie – Business and Professional Communication Quarterly, 2017
This article contains two measurement development studies on writing apprehension. Study 1 reexamines the validity of the writing apprehension measure based on the finding from prior research that a second false factor was embedded. The findings from Study 1 support the validity of a reduced measure with 6 items versus the original 20-item…
Descriptors: Writing Apprehension, Writing Tests, Test Validity, Test Reliability
Longford, Nicholas T. – Journal of Educational and Behavioral Statistics, 2014
A method for medical screening is adapted to differential item functioning (DIF). Its essential elements are explicit declarations of the level of DIF that is acceptable and of the loss function that quantifies the consequences of the two kinds of inappropriate classification of an item. Instead of a single level and a single function, sets of…
Descriptors: Test Items, Test Bias, Simulation, Hypothesis Testing
Henning, Grant – English Teaching Forum, 2012
To some extent, good testing procedure, like good language use, can be achieved through avoidance of errors. Almost any language-instruction program requires the preparation and administration of tests, and it is only to the extent that certain common testing mistakes have been avoided that such tests can be said to be worthwhile selection,…
Descriptors: Testing, English (Second Language), Testing Problems, Student Evaluation
Camilli, Gregory – Educational Research and Evaluation, 2013
In the attempt to identify or prevent unfair tests, both quantitative analyses and logical evaluation are often used. For the most part, fairness evaluation is a pragmatic attempt at determining whether procedural or substantive due process has been accorded to either a group of test takers or an individual. In both the individual and comparative…
Descriptors: Alternative Assessment, Test Bias, Test Content, Test Format
Diamond, Esther E. – 1981
As test standards and research literature in general indicate, definitions of test bias and item bias vary considerably, as do the results of existing methods of identifying biased items. The situation is further complicated by issues of content, context, construct, and criterion. In achievement tests, for example, content validity may impose…
Descriptors: Achievement Tests, Aptitude Tests, Psychometrics, Test Bias

Hanna, Gerald S.; Johnson, Fred R. – Journal of Educational Research, 1978
After analyzing four methods of selecting distractor items for multiple-choice tests, the authors recommend that classroom teachers use their own judgment in choosing test items. (Ed.)
Descriptors: Multiple Choice Tests, Teacher Responsibility, Test Construction, Test Items
Fox, Robert A. – 1979
A well developed multiple choice test is a reliable instrument for grading students and evaluating teacher presentation. There are three steps in the development of a valid multiple choice examination: 1) design or "blueprinting," 2) item construction, and 3) item analysis and evaluation. "Blueprinting" is the identification of the types of…
Descriptors: Elementary Secondary Education, Health Education, Multiple Choice Tests, Test Construction
Fishman, Judith – Writing Program Administration, 1984
Examines the CUNY-WAT program and questions many aspects of it, especially the choice and phrasing of topics. (FL)
Descriptors: Essay Tests, Higher Education, Test Format, Test Items

Ebel, Robert L. – Journal of Educational Measurement, 1982
Reasonable and practical solutions to two major problems confronting the developer of any test of educational achievement (what to measure and how to measure it) are proposed, defended, and defined. (Author/PN)
Descriptors: Measurement Techniques, Objective Tests, Test Construction, Test Items

Brambring, M.; Troster, H. – Journal of Visual Impairment and Blindness, 1994
This study evaluated the Bielefeld Developmental Test for Blind Infants and Preschoolers by comparing cognitive performance of blind and sighted children (ages three and four). Results indicated that even this test (with "blind-neutral" items) did not permit a fair comparative assessment, though it did prove suitable for within-group…
Descriptors: Blindness, Cognitive Development, Cognitive Tests, Infants