Publication Date
| In 2026 | 0 |
| Since 2025 | 621 |
| Since 2022 (last 5 years) | 3121 |
| Since 2017 (last 10 years) | 7362 |
| Since 2007 (last 20 years) | 15000 |
Descriptor
| Test Reliability | 15006 |
| Test Validity | 10245 |
| Reliability | 9748 |
| Foreign Countries | 7119 |
| Test Construction | 4807 |
| Validity | 4189 |
| Measures (Individuals) | 3872 |
| Factor Analysis | 3820 |
| Psychometrics | 3513 |
| Interrater Reliability | 3117 |
| Correlation | 3037 |
| More ▼ | |
Source
Author
Publication Type
Education Level
Audience
| Researchers | 709 |
| Practitioners | 451 |
| Teachers | 208 |
| Administrators | 122 |
| Policymakers | 66 |
| Counselors | 42 |
| Students | 38 |
| Parents | 11 |
| Community | 7 |
| Support Staff | 6 |
| Media Staff | 5 |
| More ▼ | |
Location
| Turkey | 1319 |
| Australia | 436 |
| Canada | 379 |
| China | 368 |
| United States | 271 |
| United Kingdom | 256 |
| Indonesia | 249 |
| Taiwan | 234 |
| Netherlands | 223 |
| Spain | 216 |
| California | 214 |
| More ▼ | |
Laws, Policies, & Programs
Assessments and Surveys
What Works Clearinghouse Rating
| Meets WWC Standards without Reservations | 8 |
| Meets WWC Standards with or without Reservations | 9 |
| Does not meet standards | 6 |
Peer reviewedCalhoun, Lawrence; And Others – Science Education, 1988
Presents a refined method for designing a valid and reliable Likert-type scale to test attitudes toward the generation of electricity from nuclear energy. Discusses various tests of validity that were used on the nuclear energy scale. Reports results of administration and concludes that the test is both reliable and valid. (CW)
Descriptors: Affective Measures, College Science, Energy Education, Evaluation Criteria
Woodring, Paul – Chronicle of Higher Education, 1987
News media reports about Scholastic Aptitude Test (SAT) scores and trends have drawn or suggested conclusions about the quality of schools, teachers, and colleges that are neither supported by evidence from the test itself nor consistent with its nature and purpose. (MSE)
Descriptors: College Entrance Examinations, Educational Quality, Higher Education, Information Utilization
Peer reviewedRichards, P. Scott; And Others – Computers in the Schools, 1986
Describes the development and preliminary validation of the Computer Attitudes Scale (CAS), which is designed to provide researchers and educators with a way of assessing some basic student attitudes about computer usage. (MBR)
Descriptors: Analysis of Variance, Attitude Measures, Elementary Secondary Education, Evaluation Methods
Peer reviewedKinicki, Angelo J.; And Others – Educational and Psychological Measurement, 1985
Using both the Behaviorally Anchored Rating Scales (BARS) and the Purdue University Scales, 727 undergraduates rated 32 instructors. The BARS had less halo effect, more leniency error, and lower interrater reliability. Both formats were valid. The two tests did not differ in rate discrimination or susceptibility to rating bias. (Author/GDC)
Descriptors: Behavior Rating Scales, College Faculty, Comparative Testing, Higher Education
Peer reviewedRubin, Rebecca B. – Western Journal of Speech Communication, 1986
Argues in defense of Rubin's CCAI, that Powell and Avila's study did not use adequate controls to assess the instrument's reliability. Asserts that their study simply discovered differences in communication competence among ethnic groups. (MS)
Descriptors: Academic Achievement, Blacks, Communication Research, Comparative Analysis
Peer reviewedFirnberg, James W.; Christal, Melodie E. – College and University, 1984
A survey of institutions concerning their reporting of data on the Higher Education General Information Survey and specifically examining the differences in definitions and calculations among reporting institutions, confirmed suspected problems of comparability of information. The impact of these differences on data interpretation is discussed.…
Descriptors: Computation, Data Analysis, Definitions, Federal Government
Peer reviewedCason, Gerald J.; Cason, Carolyn L. – Evaluation and the Health Professions, 1984
The proposed theory provides a basis for both measuring and correcting rater stringency error in some grossly incomplete rating data matrices. The theoretical model fits ratings made by faculty and resident physicians of student clinical performance in each of three junior year medical student cohorts better than alternative models. (Author)
Descriptors: Clinical Teaching (Health Professions), Evaluation Criteria, Evaluation Methods, Higher Education
Peer reviewedGreenberg, Mark T.; And Others – Journal of Youth and Adolescence, 1983
The nature and quality of adolescents' (n=213) attachments to peers and parents were assessed. The relative influence on measures of self-esteem and life satisfaction of relations with peers and with parents was investigated in a hierarchical regression model. (Author/PN)
Descriptors: Adolescents, Affective Measures, Attachment Behavior, Interpersonal Relationship
Peer reviewedStansfield, Charles – System, 1984
Describes the development of the Secondary Level English Proficiency (SLEP) Test specifications and the performance of each item type during administration of the test in other countries. Innovative formats such as multiple-choice cloze and multiple-choice dictation are discussed and described. In addition, the findings of a validity study…
Descriptors: Cloze Procedure, Comparative Analysis, Dictation, English (Second Language)
Hughes, Georgia K.; Copley, Lisa D.; Howley, Caitlin W.; Meehan, Merrill L. – Appalachia Educational Laboratory at Edvantia (NJ1), 2005
Building capacity within schools and districts for continuous improvement is a goal of educators at all levels across the United States of America. An important first step in capacity building is identifying schools' current strengths and weaknesses. Schools can then begin building upon existing strengths to implement improvement initiatives.…
Descriptors: Evaluation Methods, School Effectiveness, Elementary Secondary Education, Educational Improvement
Buck, Beverly; O'Brien, Tracey – Education Commission of the States (NJ3), 2005
This document is a summary of the findings of an extensive review by the Education Commission of the States (ECS) of empirical research on the effectiveness of current approaches to licensing and certifying teachers. The research review focused on eight questions (and several subquestions) that are of particular interest and concern to policy and…
Descriptors: Teacher Certification, Teacher Effectiveness, Teaching Methods, Verbal Ability
Yen, Shu Jing; Ochieng, Charles; Michaels, Hillary; Friedman, Greg – Online Submission, 2005
Year-to-year rater variation may result in constructed response (CR) parameter changes, making CR items inappropriate to use in anchor sets for linking or equating. This study demonstrates how rater severity affected the writing and reading scores. Rater adjustments were made to statewide results using an item response theory (IRT) methodology…
Descriptors: Test Items, Writing Tests, Reading Tests, Measures (Individuals)
Kenyon, Dorry; Van Duzer, Carol – Center for Adult English Language Acquisition, 2003
Ensuring that language tests for adult English language learners are appropriate, valid, and reliable is a challenge. Performance-based assessments are complex to develop and implement. Yet, because the focus of assessment, both in the National Reporting System for Adult Education (NRS) descriptors and in the Department of Education's definition…
Descriptors: Student Evaluation, Language Tests, Second Language Learning, English (Second Language)
Haertel, Edward H. – National Assessment Governing Board, 2003
The paper initially describes the sources of uncertainty in National Assessment of Educational Progress (NAEP) data and standard errors. As NAEP sample sizes have increased, greater precision has been attained by the program. For this reason, exclusion effects are increasingly important. Two scenarios of revised NAEP results are presented (for New…
Descriptors: Error of Measurement, Computation, Disabilities, Limited English Speaking
Nakamura, Yuji – Journal of Communication Studies, 1996
To find ways to improve rater reliability of a tape-mediated speaking test for Japanese university students of English as a Second Language, two studies gathered information on: how raters actually made their choices on rating sheets of students' speaking ability; determined what criteria teachers think they use and actually use in rating…
Descriptors: English (Second Language), Evaluation Criteria, Foreign Countries, Interrater Reliability

Direct link
