Publication Date
| In 2026 | 3 |
| Since 2025 | 675 |
| Since 2022 (last 5 years) | 3176 |
| Since 2017 (last 10 years) | 7417 |
| Since 2007 (last 20 years) | 15055 |
Descriptor
| Test Reliability | 15043 |
| Test Validity | 10279 |
| Reliability | 9761 |
| Foreign Countries | 7144 |
| Test Construction | 4825 |
| Validity | 4191 |
| Measures (Individuals) | 3877 |
| Factor Analysis | 3825 |
| Psychometrics | 3526 |
| Interrater Reliability | 3124 |
| Correlation | 3040 |
| More ▼ | |
Source
Author
Publication Type
Education Level
Audience
| Researchers | 709 |
| Practitioners | 451 |
| Teachers | 208 |
| Administrators | 122 |
| Policymakers | 66 |
| Counselors | 42 |
| Students | 38 |
| Parents | 11 |
| Community | 7 |
| Support Staff | 6 |
| Media Staff | 5 |
| More ▼ | |
Location
| Turkey | 1328 |
| Australia | 436 |
| Canada | 379 |
| China | 368 |
| United States | 271 |
| United Kingdom | 256 |
| Indonesia | 253 |
| Taiwan | 234 |
| Netherlands | 223 |
| Spain | 217 |
| California | 215 |
| More ▼ | |
Laws, Policies, & Programs
Assessments and Surveys
What Works Clearinghouse Rating
| Meets WWC Standards without Reservations | 8 |
| Meets WWC Standards with or without Reservations | 9 |
| Does not meet standards | 6 |
Curren, Randall – Journal of Philosophy of Education, 2006
This paper continues an exchange between its author and Andrew Davis. Part I addresses the attribution and ontological status of mental constructs and argues that philosophical work on these topics does not undermine high stakes testing. Part II examines the significance for testing of the connectedness of meaningful learning. Part III addresses…
Descriptors: Learning, Psychometrics, Relevance (Education), High Stakes Tests
Wilder, David A; Therrien, Kelly; Wine, Byron – Journal of Organizational Behavior Management, 2006
Two methods of assessing preference for stimuli (i.e., potential reinforcers) were compared for adult administrative assistant employees. During Phase 1, a survey method and a verbal choice method of assessing preference for 6 stimuli were administered. During Phase 2, a coupon system was used to determine which categories of stimuli actually…
Descriptors: Evaluation Methods, Stimuli, Reinforcement, Employees
Raykov, Tenko; Marcoulides, George A. – International Journal of Testing, 2006
A structural equation modeling approach to scale reliability evaluation can be employed to estimate generalizability theory indexes in settings where sampling of subjects and conditions is carried out. In one- and two-facet crossed designs, it is demonstrated how this method can be used to obtain estimates of relative generalizability…
Descriptors: Computation, Generalizability Theory, Structural Equation Models, Reliability
Feldt, Leonard S.; Charter, Richard A. – Educational and Psychological Measurement, 2006
Seven approaches to averaging reliability coefficients are presented. Each approach starts with a unique definition of the concept of "average," and no approach is more correct than the others. Six of the approaches are applicable to internal consistency coefficients. The seventh approach is specific to alternate-forms coefficients. Although the…
Descriptors: Reliability, Monte Carlo Methods, Research Methodology, Alternative Assessment
Singelis, Theodore M.; Yamada, Ann Marie; Barrio, Concepcion; Laney, Joshua Harrison; Her, Pa; Ruiz-Anaya, Alejandrina; Lennertz, Sara Terwilliger – Hispanic Journal of Behavioral Sciences, 2006
The metric equivalence of translated scales is often in question but seldom examined. This study presents test-retest data that support the metric equivalence of the Spanish and English language versions of three measures: the Bidimensional Acculturation Scale, the Satisfaction with Life Scale, and the Self-Construal Scale. Participants were…
Descriptors: Acculturation, Life Satisfaction, English, Test Format
White, Peter A. – Psychological Review, 2005
Comments on the response offered by Cheng and Novick to White's initial comments on Cheng's and Cheng and Novick's previous articles. White asks if regularity information necessary for causal learning. He and Cheng and Novick agree that the causal relation is understood as a generative relation, but disagree on how this understanding comes about.…
Descriptors: Differences, Review (Reexamination), Interrater Reliability, Error Correction
Robinson, Carrie H.; Betz, Nancy E. – Journal of Career Assessment, 2004
This study examined the test-retest reliability and the concurrent validity of the 17-scale Expanded Skills Confidence Inventory in samples of 321 and 175 college students. Retest values over a 3-week interval ranged from .77 to .89, with a median of .85. Using Brown and Gore's C-index, evidence for the concurrent validity of confidence score…
Descriptors: College Students, Test Validity, Vocational Interests, Test Reliability
Adams, Raymond J. – Studies in Educational Evaluation, 2005
Test reliability is a concept central to classical test theory and it is commonly stated as a requirement that a test attain a certain level of reliability before it be considered of sufficient quality for practical use. This article discusses the role of reliability in item response theory, and in particular the role of reliability in contexts…
Descriptors: Test Reliability, Error of Measurement, Item Sampling, Item Response Theory
Hintze, John M. – School Psychology Review, 2005
Direct observation plays an important role in the assessment practices of school psychologists and in the development of evidence-based practices in general and special education. The defining psychometric features of direct observation are presented, the contributions to assessment practice reviewed, and a specific proposal is offered for…
Descriptors: Observation, Psychologists, School Psychologists, Psychometrics
Garrison, D. R.; Cleveland-Innes, M.; Koole, Marguerite; Kappelman, James – Internet and Higher Education, 2006
Transcript analysis is an important methodology to study asynchronous online educational discourse. The purpose of this study is to revisit reliability and validity issues associated with transcript analysis. The goal is to provide researchers with guidance in coding transcripts. For validity reasons, it is suggested that the first step is to…
Descriptors: Computer Mediated Communication, Validity, Researchers, Discourse Analysis
Bush, Martin E. – Quality Assurance in Education: An International Perspective, 2006
Purpose: To provide educationalists with an understanding of the key quality issues relating to multiple-choice tests, and a set of guidelines for the quality assurance of such tests. Design/methodology/approach: The discussion of quality issues is structured to reflect the order in which those issues naturally arise. It covers the design of…
Descriptors: Multiple Choice Tests, Test Reliability, Educational Quality, Quality Control
Nugent, William R. – Research on Social Work Practice, 2004
A study was conducted to assess the equivalence of validity and reliability of two forms of the Self-Esteem Rating Scale. A total of 228 responses were obtained from a purposive sample. Several data analysis methods were used to test specific hypotheses, and two methods of equating observed scores on the two forms were used. The results were…
Descriptors: Validity, Rating Scales, Equated Scores, Data Analysis
Kaufman, Alan S.; Flanagan, Dawn P.; Alfonso, Vincent C.; Mascolo, Jennifer T. – Journal of Psychoeducational Assessment, 2006
Within the field of psychological assessment, the Wechsler scales continue to be the most widely used intelligence batteries. The concepts, methods, and procedures inherent in the design of the Wechsler scales have been so influential that they have guided most of the test development and research in the field for more than a half century. This…
Descriptors: Intelligence Tests, Test Reviews, Testing, Scoring
Hadfield, Jill – RELC Journal: A Journal of Language Teaching and Research, 2006
This article offers an overview of learning styles theories, selects seven to consider in more detail on the basis of recent research into validity and reliability, and synthesises these theories into a framework to aid task design in Teacher Education.
Descriptors: Cognitive Style, Teaching Methods, Teacher Education, Measures (Individuals)
Harrison, Andrew J.; Jensen, Randall L.; Donoghue, Orna – Measurement in Physical Education and Exercise Science, 2005
The reliability of a laser system was compared with the reliability of a video-based kinematic analysis in measuring displacement and velocity during running. Validity and reliability of the laser on static measures was also assessed at distances between 10 m and 70 m by evaluating the coefficient of variation and intraclass correlation…
Descriptors: Physical Activities, Motion, Lasers, Video Technology

Peer reviewed
Direct link
