Publication Date
| In 2026 | 0 |
| Since 2025 | 5 |
| Since 2022 (last 5 years) | 45 |
| Since 2017 (last 10 years) | 91 |
| Since 2007 (last 20 years) | 144 |
Descriptor
| Test Format | 418 |
| Test Reliability | 418 |
| Test Validity | 243 |
| Test Construction | 135 |
| Test Items | 119 |
| Higher Education | 88 |
| Multiple Choice Tests | 68 |
| Foreign Countries | 67 |
| Testing | 65 |
| Test Interpretation | 61 |
| Comparative Analysis | 57 |
| More ▼ | |
Source
Author
Publication Type
Education Level
Audience
| Practitioners | 33 |
| Teachers | 23 |
| Administrators | 18 |
| Researchers | 12 |
| Community | 1 |
| Counselors | 1 |
| Policymakers | 1 |
| Students | 1 |
| Support Staff | 1 |
Location
| New York | 9 |
| Turkey | 8 |
| California | 7 |
| Canada | 6 |
| Japan | 6 |
| Germany | 4 |
| United Kingdom | 4 |
| Georgia | 3 |
| Israel | 3 |
| France | 2 |
| Indonesia | 2 |
| More ▼ | |
Laws, Policies, & Programs
| Individuals with Disabilities… | 1 |
| Job Training Partnership Act… | 1 |
| No Child Left Behind Act 2001 | 1 |
| Pell Grant Program | 1 |
Assessments and Surveys
What Works Clearinghouse Rating
Peer reviewedAiken, Lewis R. – Educational and Psychological Measurement, 1983
Each of six forms of a 10-item teacher evaluation rating scale, having two to seven response categories per form, was administered to over 100 college students. Means of item responses and item variances increased with the number of response categories. Internal consistency of total scores did not change systematically. (Author/PN)
Descriptors: College Students, Higher Education, Item Analysis, Rating Scales
Peer reviewedHsu, Louis M. – Applied Psychological Measurement, 1979
A comparison of the relative ordering power of separate and grouped-items true-false tests indicated that neither type of test was uniformly superior to the other across all levels of knowledge of examinees. Grouped-item tests were found superior for examinees with low levels of knowledge. (Author/CTM)
Descriptors: Academic Ability, Knowledge Level, Multiple Choice Tests, Scores
Peer reviewedLubin, Bernard; Van Whitlock, Rod – Assessment, 1996
The reliability and validity of the positive and negative mood scales of the trait version of the State Trait--Depression Adjective Check Lists (D. Watson and others, 1988) were supported with 269 college students, 197 adolescents, and 165 older adults. Results provide evidence of the equivalence of the positive and negative scales. (SLD)
Descriptors: Adolescents, College Students, Depression (Psychology), High Schools
Peer reviewedWoodruff, David J.; Sawyer, Richard L. – Applied Psychological Measurement, 1989
Two methods--non-distributional and normal--are derived for estimating measures of pass-fail reliability. Both are based on the Spearman Brown formula and require only a single test administration. Results from a simulation (n=20,000 examinees) and a licensure examination (n=4,828 examinees) illustrate these methods. (SLD)
Descriptors: Equations (Mathematics), Estimation (Mathematics), Licensing Examinations (Professions), Measures (Individuals)
Peer reviewedJohnson, Nancy E.; And Others – Assessment, 1994
Development of an alternate form of Raven's Standard Progressive Matrices Test is described. Reliability analysis with 449 children of differing racial/ethnic backgrounds showed good reliability and comparable predictive validity. The alternate form is a promising research tool. (SLD)
Descriptors: Children, Ethnic Groups, Intelligence Tests, Matrices
Peer reviewedHart, David K. – Slavic and East European Journal, 1994
Reports on a series of tests made to determine whether a correlation exists between modes of testing and the ability of Russian language students to place stress correctly. Contrary to the hypothesis, it was found that a significant variation did occur in the results obtained both for test type and test modality. (16 references) (MDM)
Descriptors: College Students, Higher Education, Language Skills, Russian
Peer reviewedKobak, Kenneth A.; And Others – Psychological Assessment, 1993
A developed computer-administered form of the Hamilton Anxiety Scale and the clinician form of the instrument were administered to 214 psychiatric outpatients and 78 community adults. Results support the reliability and validity of the computer-administered version as an alternative to the clinician-administered version. (SLD)
Descriptors: Adults, Anxiety, Clinical Diagnosis, Comparative Testing
Pike, Gary R. – 1994
This paper examines the proposed use of student self-report data as proxies for College Basic Academic Subjects Examination (College BASE) scores and as policy indicators of good educational practice. A recent study by the National Center for Higher Education Management Systems had recommended this use of student self-reports. For this study 540…
Descriptors: Achievement Tests, College Outcomes Assessment, College Seniors, Comparative Analysis
Martin, Randy – 1988
Reasons for administering tests fall into two categories--decision-making and promoting learning. The two bases of tests are learning objectives and the level of learning at which training is developed. Test development involves a number of steps. The best way to tie objectives to test items is through the use of a table of specifications, which…
Descriptors: Elementary Secondary Education, Item Analysis, Item Banks, Postsecondary Education
Brown, William R. – 1988
The evaluation tools written by teachers are rarely valid or reliable. One teaching aid that can help in the creation of an effective evaluation instrument is called a test map. A test map is a systematic method to consider variables that are important in the construction of the format of a test. Five variables that are discussed in the test…
Descriptors: Elementary Secondary Education, Evaluation Methods, Higher Education, Student Evaluation
Sax, Gilbert; Reiter, Pauline B. – 1980
Despite the popularity of both multiple-choice (MC) and true-false (TF) items, most investigations comparing the two formats have done so to determine the optimum number of choices to be given to students within a given time period. The purpose of this investigation was to compare the reliabilities and the validities of both formats when the items…
Descriptors: Analysis of Variance, Correlation, Higher Education, Item Analysis
Case, Susan M.; And Others – 1988
An item format incorporating pattern recognition was designed to assess medical students' abilities in the area of clinical diagnosis. A group of approximately 20 faculty members of five New England medical schools met in Worcester for half of a day to develop pattern recognition items. Teams of four to six physicians were assigned to work on…
Descriptors: Clinical Diagnosis, Higher Education, Item Analysis, Medical Evaluation
Hensley, Wayne E. – 1982
Because item order and salience may affect the findings of social science research, a study was conducted to determine the effect of topic salience on subject response to different item orders. In the study, 10 high salience self-esteem items were presented with 10 low salience items concerning product labels in three versions. In the first…
Descriptors: College Students, Communication Research, Higher Education, Item Analysis
Haladyna, Tom; Roid, Gale – 1981
Two approaches to criterion-referenced test construction are compared. Classical test theory is based on the practice of random sampling from a well-defined domain of test items; latent trait theory suggests that the difficulty of the items should be matched to the achievement level of the student. In addition to these two methods of test…
Descriptors: Criterion Referenced Tests, Error of Measurement, Latent Trait Theory, Test Construction
Peer reviewedFrary, Robert B. – Journal of Educational Measurement, 1985
Responses to a sample test were simulated for examinees under free-response and multiple-choice formats. Test score sets were correlated with randomly generated sets of unit-normal measures. The extent of superiority of free response tests was sufficiently small so that other considerations might justifiably dictate format choice. (Author/DWH)
Descriptors: Comparative Analysis, Computer Simulation, Essay Tests, Guessing (Tests)


