Publication Date
| In 2026 | 3 |
| Since 2025 | 675 |
| Since 2022 (last 5 years) | 3176 |
| Since 2017 (last 10 years) | 7417 |
| Since 2007 (last 20 years) | 15055 |
Descriptor
| Test Reliability | 15043 |
| Test Validity | 10279 |
| Reliability | 9761 |
| Foreign Countries | 7144 |
| Test Construction | 4825 |
| Validity | 4191 |
| Measures (Individuals) | 3877 |
| Factor Analysis | 3825 |
| Psychometrics | 3526 |
| Interrater Reliability | 3124 |
| Correlation | 3040 |
| More ▼ | |
Source
Author
Publication Type
Education Level
Audience
| Researchers | 709 |
| Practitioners | 451 |
| Teachers | 208 |
| Administrators | 122 |
| Policymakers | 66 |
| Counselors | 42 |
| Students | 38 |
| Parents | 11 |
| Community | 7 |
| Support Staff | 6 |
| Media Staff | 5 |
| More ▼ | |
Location
| Turkey | 1328 |
| Australia | 436 |
| Canada | 379 |
| China | 368 |
| United States | 271 |
| United Kingdom | 256 |
| Indonesia | 253 |
| Taiwan | 234 |
| Netherlands | 223 |
| Spain | 217 |
| California | 215 |
| More ▼ | |
Laws, Policies, & Programs
Assessments and Surveys
What Works Clearinghouse Rating
| Meets WWC Standards without Reservations | 8 |
| Meets WWC Standards with or without Reservations | 9 |
| Does not meet standards | 6 |
Bartels, Meike; Boomsma, Dorret I.; Hudziak, James J.; van Beijsterveldt, Toos C. E. M.; van den Oord, Edwin J. C. G. – Psychological Methods, 2007
Genetically informative data can be used to address fundamental questions concerning the measurement of behavior in children. The authors illustrate this with longitudinal multiple-rater data on internalizing problems in twins. Valid information on the behavior of a child is obtained for behavior that multiple raters agree upon and for…
Descriptors: Twins, Behavior Problems, Genetics, Error of Measurement
Hmelo-Silver, Cindy E.; Marathe, Surabhi; Liu, Lei – Journal of the Learning Sciences, 2007
Understanding complex systems is fundamental to understanding science. The complexity of such systems makes them very difficult to understand because they are composed of multiple interrelated levels that interact in dynamic ways. The goal of this study was to understand how experts and novices differed in their understanding of two complex…
Descriptors: Ecology, Anatomy, Physiology, Knowledge Representation
Olver, Mark E.; Wong, Stephen C. P.; Nicholaichuk, Terry; Gordon, Audrey – Psychological Assessment, 2007
The Violence Risk Scale-Sexual Offender version (VRS-SO) is a rating scale designed to assess risk and predict sexual recidivism, to measure and link treatment changes to sexual recidivism, and to inform the delivery of sexual offender treatment. The VRS-SO comprises 7 static and 17 dynamic items empirically or conceptually linked to sexual…
Descriptors: Validity, Rating Scales, Recidivism, Interrater Reliability
Spooren, P.; Mortelmans, D.; Denekens, J. – Assessment & Evaluation in Higher Education, 2007
Students' evaluation of teaching skills has been an important yet controversial tool in the improvement of teaching quality during the last few decades. When searching for an apt student questionnaire to measure instructional skills, it appeared that most existing questionnaires the authors were able to collect are based on a single-item type of…
Descriptors: Validity, Teaching Skills, Teacher Effectiveness, Student Evaluation
de Villiers, Jessica; Fine, Jonathan; Ginsberg, Gary; Vaccarella, Liezanne; Szatmari, Peter – Journal of Autism and Developmental Disorders, 2007
There are few well-standardized measures of conversational breakdown in Autism Spectrum Disorders (ASD). The study's objective was to develop a scale for measuring pragmatic impairments in conversations of individuals with ASD. We analyzed 46 semi-structured conversations of children and adolescents with high-functioning ASD using a functional…
Descriptors: Measures (Individuals), Speech Communication, Semantics, Pragmatics
Brown, Deirdre A.; Pipe, Margaret-Ellen; Lewis, Charlie; Lamb, Michael E.; Orbach, Yael – Journal of Consulting and Clinical Psychology, 2007
The authors examined the accuracy of information elicited from seventy-nine 5- to 7-year-old children about a staged event that included physical contact-touching. Four to six weeks later, children's recall for the event was assessed using an interview protocol analogous to those used in forensic investigations with children. Following the…
Descriptors: Investigations, Freehand Drawing, Cognitive Objectives, Tactual Perception
Guskey, Thomas R. – Educational Measurement: Issues and Practice, 2007
This study compared different stakeholders' perceived validity of various indicators of student learning used to judge the quality of students' academic performance. Data were gathered from the questionnaire responses of 314 educators in three states that have implemented comprehensive state-wide assessment programs with high-stakes consequences…
Descriptors: Academic Achievement, Educational Indicators, State Surveys, Participation
Florida State Dept. of Education, Tallahassee. Div. of Vocational, Adult, and Community Education. – 1991
This packet contains a manual and a workbook for developing performance tests in vocational education. The manual gives an in-depth description of how to develop, score, and use performance tests. It includes the following sections: definitions of performance testing, steps in developing a performance test, selecting a performance development…
Descriptors: Interrater Reliability, Performance Tests, Postsecondary Education, Scoring
Shavelson, Richard J.; And Others – 1993
In this paper, performance assessments are cast within a sampling framework. A performance assessment score is viewed as a sample of student performance drawn from a complex universe defined by a combination of all possible tasks, occasions, raters, and measurement methods. Using generalizability theory, the authors present evidence bearing on the…
Descriptors: Academic Achievement, Educational Assessment, Error of Measurement, Evaluators
North Carolina State Dept. of Public Instruction, Raleigh. Div. of Accountability/Testing. – 2001
During 1999-2000 school year, the North Carolina Alternate Assessment Portfolio was administered to eligible students with serious cognitive deficits statewide as a pilot program. This report provides state, regional, and local education agency results of that pilot program. The purpose of the pilot was to review the feasibility, validity, and…
Descriptors: Academic Achievement, American Indians, Cultural Differences, Elementary Secondary Education
Shotsberger, Paul G.; Crawford, Ann R. – 1996
D. B. Bailey and S. A. Palsha (1992) proposed two modified versions of the Stages of Concern Questionnaire for measuring teacher concerns during a reform effort. Their analysis suggested the use of a 5-factor model with 35 items or 15 items rather than the original 7-stage, 35-item Concerns Based Adoption Model (CBAM). The present study was…
Descriptors: Algebra, Educational Change, Reliability, School Restructuring
Lawrence, Ida M. – 1995
This study examined to what extent, if any, estimates of reliability for a multiple choice test are affected by the presence of large item sets where each set shares common reading material. The purpose of this research was to assess the effect of local item dependence on estimates of reliability for verbal portions of seven forms of the old and…
Descriptors: Estimation (Mathematics), High Schools, Multiple Choice Tests, Reading Tests
Lunz, Mary E.; O'Neill, Thomas R. – 1997
This retrospective longitudinal study was designed to show grading leniency patterns of judges within and across clinical examination administrations. Data from 17 different administrations of the histology examination of the American Society of Clinical Pathologists over 10 years were studied. Over the 10 years there were 4,683 candidates and 57…
Descriptors: Higher Education, Interrater Reliability, Item Response Theory, Judges
Smist, Julianne M.; And Others – 1994
Since the mid 1980's increasing research has been conducted on the relationship that exists between student attitude toward science and science achievement. While many are focusing on investigating this relationship, this study focused on assuring that measurement of attitude changes of diverse groups in America is done with valid and reliable…
Descriptors: Academic Achievement, Attitude Measures, Classroom Research, High Schools
Krippendorff, Klaus – 1992
When one wants to set data reliability standards for a class of scientific inquiries or when one needs to compare and select among many different kinds of data with reliabilities that are crucial to a particular research undertaking, then one needs a single reliability coefficient that is adaptable to all or most situations. Work toward this goal…
Descriptors: Definitions, Equations (Mathematics), Mathematical Models, Reliability

Peer reviewed
Direct link
