Publication Date
| In 2026 | 1 |
| Since 2025 | 11 |
| Since 2022 (last 5 years) | 55 |
| Since 2017 (last 10 years) | 98 |
| Since 2007 (last 20 years) | 164 |
Descriptor
| Test Format | 507 |
| Test Validity | 507 |
| Test Reliability | 243 |
| Test Construction | 180 |
| Test Items | 127 |
| Foreign Countries | 108 |
| Language Tests | 96 |
| Higher Education | 86 |
| Testing | 80 |
| Computer Assisted Testing | 73 |
| Test Use | 67 |
| More ▼ | |
Source
Author
Publication Type
Education Level
| Higher Education | 61 |
| Postsecondary Education | 51 |
| Secondary Education | 30 |
| Elementary Education | 25 |
| Middle Schools | 19 |
| Junior High Schools | 15 |
| High Schools | 13 |
| Grade 8 | 11 |
| Grade 4 | 9 |
| Elementary Secondary Education | 8 |
| Grade 5 | 8 |
| More ▼ | |
Audience
| Practitioners | 30 |
| Teachers | 19 |
| Administrators | 17 |
| Researchers | 9 |
| Community | 1 |
| Policymakers | 1 |
| Students | 1 |
| Support Staff | 1 |
Location
| Canada | 10 |
| China | 9 |
| New York | 9 |
| Japan | 7 |
| Netherlands | 6 |
| Germany | 5 |
| Turkey | 5 |
| United Kingdom | 5 |
| United Kingdom (England) | 5 |
| Australia | 4 |
| Georgia | 4 |
| More ▼ | |
Laws, Policies, & Programs
| Elementary and Secondary… | 1 |
| Individuals with Disabilities… | 1 |
| Job Training Partnership Act… | 1 |
| No Child Left Behind Act 2001 | 1 |
| Pell Grant Program | 1 |
Assessments and Surveys
What Works Clearinghouse Rating
Peer reviewedBenson, Jeri – Educational and Psychological Measurement, 1981
A review of the research on item writing, item format, test instructions, and item readability indicated the importance of instrument structure in the interpretation of test data. The effect of failing to consider these areas on the content validity of achievement test scores is discussed. (Author/GK)
Descriptors: Achievement Tests, Elementary Secondary Education, Literature Reviews, Scores
Peer reviewedGreen, Kathy – Journal of Experimental Education, 1979
Reliabilities and concurrent validities of teacher-made multiple-choice and true-false tests were compared. No significant differences were found even when multiple-choice reliability was adjusted to equate testing time. (Author/MH)
Descriptors: Comparative Testing, Higher Education, Multiple Choice Tests, Test Format
Peer reviewedHansen, Jo-Ida C.; Neuman, Jody L.; Haverkamp, Beth E.; Lubinski, Barbara R. – Measurement and Evaluation in Counseling and Development, 1997
Examined user reaction to computer-administered and paper-and-pencil-administered forms of the Strong Interest Inventory. Results indicate that user reactions to the two administration modes were reasonably similar in most areas. However, the computer group indicated more often that their version was easier to use and follow. (RJM)
Descriptors: College Students, Computer Assisted Testing, Higher Education, Interest Inventories
Sternberg, Robert J. – Learning, 1989
Standardized tests which measure a narrow span of intelligence unfairly penalize students whose strengths don't fall within that range. Three kinds of intelligence (analytical, creative, practical) are discussed. Sternberg's Triarchic Abilities Test, currently being test-piloted, assesses all three aspects of intelligence in contrast to current…
Descriptors: Accountability, Cognitive Processes, Creativity, Elementary Secondary Education
Peer reviewedDowning, Steven M.; And Others – Applied Measurement in Education, 1995
The criterion-related validity evidence and other psychometric characteristics of multiple-choice and multiple true-false (MTF) items in medical specialty certification examinations were compared using results from 21,346 candidates. Advantages of MTF items and implications for test construction are discussed. (SLD)
Descriptors: Cognitive Ability, Licensing Examinations (Professions), Medical Education, Objective Tests
Peer reviewedFederico, Pat-Anthony – Behavior Research Methods, Instruments, and Computers, 1991
Using a within-subjects design, computer-based and paper-based tests of aircraft silhouette recognition were administered to 83 male naval pilots and flight officers to determine the relative reliabilities and validities of 2 measurement modes. Relative reliabilities and validities of the two modes were contingent on the multivariate measurement…
Descriptors: Aircraft Pilots, Comparative Testing, Computer Assisted Testing, Males
Peer reviewedStraus, Murray A.; Hamby, Sherry L.; Finkelhor, David; Moore, David W.; Runyan, Desmond – Child Abuse & Neglect: The International Journal, 1998
A study of 1,000 children examined the effectiveness of the Parent-Child Conflict Tactics Scales (CTSPC) in measuring parental psychological and physical maltreatment of children, as well as nonviolent modes of discipline. The CTSPC was found to be better suited to measuring child maltreatment than the original Conflict Tactics Scales. (Author/CR)
Descriptors: Child Abuse, Child Neglect, Discipline, Evaluation Methods
Osterlind, Steven J.; Miao, Danmin; Sheng, Yanyan; Chia, Rosina C. – International Journal of Testing, 2004
This study investigated the interaction between different cultural groups and item type, and the ensuing effect on construct validity for a psychological inventory, the Myers-Briggs Type Indicator (MBTI, Form G). The authors analyzed 94 items from 2 Chinese-translated versions of the MBTI (Form G) for factorial differences among groups of…
Descriptors: Test Format, Undergraduate Students, Cultural Differences, Test Validity
Wainer, Howard; And Others – 1991
A series of computer simulations was run to measure the relationship between testlet validity and the factors of item pool size and testlet length for both adaptive and linearly constructed testlets. Results confirmed the generality of earlier empirical findings of H. Wainer and others (1991) that making a testlet adaptive yields only marginal…
Descriptors: Adaptive Testing, Computer Assisted Testing, Computer Simulation, Item Banks
Goldstein, Irwin; And Others – 1979
The purpose of this test is to evaluate a non-native speaking student's speaking knowledge of the basic structures of English, using the most frequently used words in the English Language. The test does not attempt to determine vocabulary level or student's ability to learn vocabulary effectively, rather the test focuses exclusively on aural/oral…
Descriptors: English (Second Language), Language Proficiency, Language Tests, Listening Comprehension
Federico, Pat-Anthony; Liggett, Nina L. – 1989
Seventy-five subjects (Naval F-14 and E-2C crew members) were administered computer-based and paper-based tests of threat-parameter knowledge represented as a semantic network in order to determine the relative reliabilities and validities of these two assessment modes. Estimates of internal consistencies, equivalences, and discriminant validities…
Descriptors: Comparative Analysis, Computer Assisted Testing, Knowledge Level, Military Personnel
Brittain, Mary M.; Brittain, Clay V. – 1981
A behavioral domain is well-defined when it is clear to both test developers and test users which categories of performance should or should not be considered for potential test items. Only those tests that are keyed to well-defined domains meet the definition of criterion-referenced tests. The greatest proliferation of criterion-referenced tests…
Descriptors: Criterion Referenced Tests, Reading Achievement, Reading Tests, Test Construction
Samejima, Fumiko – 1980
Research related to the multiple choice test item is reported, as it is conducted by educational technologists in Japan. Sato's number of hypothetical equivalent alternatives is introduced. The based idea behind this index is that the expected uncertainty of the m events, or alternatives, be large and the number of hypothetical, equivalent…
Descriptors: Foreign Countries, Latent Trait Theory, Mathematical Models, Multiple Choice Tests
Peer reviewedDrain, Susan; Manos, Kenna – English Quarterly, 1986
Reviews a writing abilities competency test based on samples of essay writing. A copy of the test is appended. (NKA)
Descriptors: Essays, Higher Education, Language Tests, Test Construction
Peer reviewedJaeger, Richard M.; Wolf, Marian B. – Journal of Educational Measurement, 1982
The effectiveness of a Likert-scale and three paired-choice presentation formats in discriminating among parents' preferences for curriculum elements were compared. Paired-choice formats gave more reliable discriminations which increased with stimulus specificity. Similarities and differences in preference orderings are discussed. (Author/CM)
Descriptors: Comparative Analysis, Elementary Education, Parent Attitudes, Parent School Relationship

Direct link
