Publication Date
| In 2026 | 3 |
| Since 2025 | 675 |
| Since 2022 (last 5 years) | 3176 |
| Since 2017 (last 10 years) | 7417 |
| Since 2007 (last 20 years) | 15055 |
Descriptor
| Test Reliability | 15043 |
| Test Validity | 10279 |
| Reliability | 9761 |
| Foreign Countries | 7144 |
| Test Construction | 4825 |
| Validity | 4191 |
| Measures (Individuals) | 3877 |
| Factor Analysis | 3825 |
| Psychometrics | 3526 |
| Interrater Reliability | 3124 |
| Correlation | 3040 |
| More ▼ | |
Source
Author
Publication Type
Education Level
Audience
| Researchers | 709 |
| Practitioners | 451 |
| Teachers | 208 |
| Administrators | 122 |
| Policymakers | 66 |
| Counselors | 42 |
| Students | 38 |
| Parents | 11 |
| Community | 7 |
| Support Staff | 6 |
| Media Staff | 5 |
| More ▼ | |
Location
| Turkey | 1328 |
| Australia | 436 |
| Canada | 379 |
| China | 368 |
| United States | 271 |
| United Kingdom | 256 |
| Indonesia | 253 |
| Taiwan | 234 |
| Netherlands | 223 |
| Spain | 217 |
| California | 215 |
| More ▼ | |
Laws, Policies, & Programs
Assessments and Surveys
What Works Clearinghouse Rating
| Meets WWC Standards without Reservations | 8 |
| Meets WWC Standards with or without Reservations | 9 |
| Does not meet standards | 6 |
Howley, Caitlin; Riffle, Joy – 2002
A pilot version of a School Capacity Assessment (SCA) was developed in 2002 to assess the degree to which schools possess the potential to become high performing learning communities. The SCA was part of AELs School Capacity Development project. The pilot version of the SCA was intended to be administered to K-12 professional staff to assist them…
Descriptors: Educational Change, Institutional Characteristics, Low Achievement, Pilot Projects
Lee, Yong-Won; Golub-Smith, Marna; Payton, Carmen; Carey, Jill – 2001
This study investigated the validity of the current reliability estimation procedure for the Test of Spoken English (TSE), a tape-mediated semi-performance test of 12 speaking tasks, from the perspective of generalizability theory and examined the feasibility of shortening the test without compromising the psychometric quality of the test. Data…
Descriptors: Adults, English (Second Language), Estimation (Mathematics), Generalizability Theory
Bastick, Tony – 2001
The research literature on student evaluation of teaching (SET) is filled with criticisms of the process, its applications, and the student feedback questionnaire it uses. SETs are still used, however, because there has seemed to be no economical, valid, and reliable alternative. This paper reports on an alternative alignment process for…
Descriptors: College Faculty, Criteria, Higher Education, Learning
Breyer, F. Jay; Lewis, Charles – 1994
A single-administration classification reliability index is described that estimates the probability of consistently classifying examinees to mastery or nonmastery states as if those examinees had been tested with two alternate forms. The procedure is applicable to any test used for classification purposes, subdividing that test into two…
Descriptors: Classification, Cutting Scores, Objective Tests, Pass Fail Grading
Ediger, Marlow – 2001
To assure the fair and honest grading of student achievement, validity and reliability are key to writing test items. Clarity in writing each item is essential. Multiple procedures of assessing the achievement of university students should be implemented, and instructors and professors should be held accountable for the fair and honest grading of…
Descriptors: Academic Achievement, College Students, Educational Technology, Grades (Scholastic)
Piburn, Michael; Sawada, Daiyo – 2000
The Reformed Teaching Observation Protocol (RTOP) was created by the Evaluation Group of the Arizona Collaborative for Excellence in the Preparation of Teachers (ACEPT) as an observational instrument designed to measure "reformed" teaching. This document is a guide to its use. The theoretical concepts that guided the development of the…
Descriptors: Classroom Observation Techniques, Educational Change, Elementary Secondary Education, Measures (Individuals)
Mertler, Craig A. – 1999
This study examined processes and techniques teachers used to ensure that their assessments were valid and reliable, noting the extent to which they engaged in these processes. A sample of 625 elementary and secondary teachers received mailed copies of the Ohio Teacher Assessment Practices Survey, which asked about steps that they followed and the…
Descriptors: Elementary Secondary Education, Evaluation Methods, Student Evaluation, Teacher Attitudes
Olsina, L; Rossi, G. – 1999
This paper identifies World Wide Web site characteristics and attributes and groups them in a hierarchy. The primary goal is to classify the elements that might be part of a quantitative evaluation and comparison process. In order to effectively select quality characteristics, different users' needs and behaviors are considered. Following an…
Descriptors: Classification, Comparative Analysis, Efficiency, Evaluation Criteria
Mayton, Daniel M., II; Richel, Timothy W.; Susnjic, Silvia; Majdanac, Maja – 2002
The Teenage Nonviolence Test (TNT) has previously been established as a generally reliable and valid measure of nonviolence in adolescents. This study examined the extent to which the TNT's reliability and validity could be extended to college students aged 18-22 years of age. Five of the six subscales of the TNT were found to be reliable. The…
Descriptors: Affective Measures, College Students, Concurrent Validity, Higher Education
Meehan, Merrill L.; Cowley, Kimberly S.; Wiersma, William; Orletsky, Sandra R.; Sattes, Beth D.; Walsh, Jackie A. – 2002
As part of its school improvement effort, AEL, a regional education laboratory, developed the Continuous School Improvement Questionnaire (AEL CSIQ). Staff from the AEL Quest schools program drafted a 65-item questionnaire to help measure and assess the efforts of the project team in their work with the 18 schools in the Quest network. These items…
Descriptors: Educational Improvement, Elementary Secondary Education, Measurement Techniques, Reliability
Wainer, Howard – 1994
This study examined the Law School Admission Test (LSAT) through the use of testlet methods to model its inherent, locally dependent structure. Precision, measured by reliability, and fairness, measured by the comparability of performance across all identified subgroups of examinees, were the focus of the study. The polytomous item response theory…
Descriptors: College Entrance Examinations, Item Response Theory, Reading Comprehension, Reading Tests
Popp, Sharon E. Osborn; Ryan, Joseph M.; Thompson, Marilyn S.; Behrens, John T. – 2003
The purposes of this study were to investigate the role of benchmark writing samples in direct assessment of writing and to examine the consequences of differential benchmark selection with a common writing rubric. The influences of discourse and grade level were also examined within the context of differential benchmark selection. Raters scored…
Descriptors: Benchmarking, Elementary Education, Elementary School Students, Interrater Reliability
Bertrand, Richard; Boiteau, Nancy – 2003
This study aimed at finding criteria like within-method stability rates or between-method agreement rates that could help to choose a powerful and low-cost differential item function (DIF) detection method. The study tried to verify the within-method stability of item response theory (IRT) based over non-IRT-based procedures in two different…
Descriptors: Cross Cultural Studies, Cultural Differences, Foreign Countries, Item Bias
Lovell, Tobin; White, Kelly; Thatcher, Trent; Mayle, Amanda; Willis, Tonya; Rambaldo, Lisa; Cauley, Kate; Clasen, Carla; Meyer, Cheryl – 1999
To investigate whether the structured and applied nature of a service learning method results in changes in knowledge, skills, and attitudes regarding working with underserved populations, the Service Learning Instrument-Health Professional (SLI-HP) was designed. This instrument measures both student experiences and changes in attitudes, skills,…
Descriptors: Attitude Change, Health Personnel, Higher Education, Knowledge Level
Wang, Qi – 2000
Two studies focused on the reliability and validity of T.M. Singelis's 24-item Self-Construal Scale (SCS) (1994). In the first study, Cronbach alphas were calculated to assess the internal consistency of the reliability of the two subscales that were supposed to measure individuals' independent and interdependent self construals. The sample was…
Descriptors: Cultural Context, Cultural Influences, Factor Analysis, Higher Education


