Publication Date
| In 2026 | 3 |
| Since 2025 | 675 |
| Since 2022 (last 5 years) | 3176 |
| Since 2017 (last 10 years) | 7417 |
| Since 2007 (last 20 years) | 15055 |
Descriptor
| Test Reliability | 15043 |
| Test Validity | 10279 |
| Reliability | 9761 |
| Foreign Countries | 7144 |
| Test Construction | 4825 |
| Validity | 4191 |
| Measures (Individuals) | 3877 |
| Factor Analysis | 3825 |
| Psychometrics | 3526 |
| Interrater Reliability | 3124 |
| Correlation | 3040 |
| More ▼ | |
Source
Author
Publication Type
Education Level
Audience
| Researchers | 709 |
| Practitioners | 451 |
| Teachers | 208 |
| Administrators | 122 |
| Policymakers | 66 |
| Counselors | 42 |
| Students | 38 |
| Parents | 11 |
| Community | 7 |
| Support Staff | 6 |
| Media Staff | 5 |
| More ▼ | |
Location
| Turkey | 1328 |
| Australia | 436 |
| Canada | 379 |
| China | 368 |
| United States | 271 |
| United Kingdom | 256 |
| Indonesia | 253 |
| Taiwan | 234 |
| Netherlands | 223 |
| Spain | 217 |
| California | 215 |
| More ▼ | |
Laws, Policies, & Programs
Assessments and Surveys
What Works Clearinghouse Rating
| Meets WWC Standards without Reservations | 8 |
| Meets WWC Standards with or without Reservations | 9 |
| Does not meet standards | 6 |
Rigdon, Michael A.; And Others – Death Education, 1979
The Threat Index (TI), theoretically based on George Kelly's personal construct theory, was developed as a measure of death orientation. Outlines the emerging reliability and validity picture. The aim is to give direction to future TI research by summarizing and critically evaluating the currently available data. (Author)
Descriptors: Death, Evaluation, Grief, Measurement Techniques
Peer reviewedCook, Ann; And Others – National Elementary Principal, 1980
Examines the controversy surrounding the administration of the Metropolitan Achievement Test on reading and the subsequent recall of the results data by the chancellor of the New York City schools. (IRT)
Descriptors: Elementary Education, Reading, Reading Tests, Standardized Tests
Peer reviewedNoble, Gilbert H. – Educational and Psychological Measurement, 1977
A computer program providing comprehensive test and item analysis is presented. Completing its performance on one run, the program, written in Fortran and emphasizing ease of use, integrates various statistical techniques for analyzing individual items and the overall test, in addition to generating a variety of standard scores. (Author/JKS)
Descriptors: Computer Programs, Correlation, Equated Scores, Item Analysis
Peer reviewedMerenda, Peter F. – Measurement and Evaluation in Counseling and Development, 1997
Offers suggestions for proper procedures for authors to use--and some pitfalls to avoid--when writing studies using factor analysis methods. Discusses distinctions among different methods of analysis, the adequacy of factor structure, and other notes of caution. Encourages authors to ensure that their research is statistically sound. (RJM)
Descriptors: Data Interpretation, Factor Analysis, Factor Structure, Reliability
Peer reviewedMcGrew, Kevin S.; Wrightson, Wade – Psychology in the Schools, 1997
Demonstrates how data smoothing procedures--procedures commonly used in the development of continuous test norms--can provide better estimates of the reliability, uniqueness, and general factor characteristics for the Wechsler Intelligence Scale for Children, third edition, subtests. Suggests that such procedures are applicable to other test…
Descriptors: Children, Elementary Education, Factor Analysis, Factor Structure
Peer reviewedChase, Clint – Mid-Western Educational Researcher, 1996
Classical procedures for calculating the two indices of decision consistency (P and Kappa) for criterion-referenced tests require two testings on each child. Huynh, Peng, and Subkoviak have presented one-testing procedures for these indices. These indices can be estimated without any test administration using Ebel's estimates of the mean, standard…
Descriptors: Criterion Referenced Tests, Educational Research, Educational Testing, Estimation (Mathematics)
Peer reviewedDorn, Lorah D.; Susman, Elizabeth J.; Ponirakis, Angelo – Journal of Youth and Adolescence, 2003
Studied whether pubertal timing by self-report (SR), parent report (PR), or physical examination predicted the same aspects of adjustment and behavior problems. Findings for 52 girls, 56 boys, and their parents show that pubertal timing by SR and PR did not always provide the same level of prediction as did physical examination. (SLD)
Descriptors: Adjustment (to Environment), Adolescents, Behavior Patterns, Interrater Reliability
Peer reviewedCraig, Cora L.; Russell, Storm J.; Cameron, Christine – Medicine & Science in Sports & Exercise, 2002
Assessed the reliability and criterion validity of the Physical Activity Monitor, a telephone-interview adaptation of the Minnesota Leisure Time Physical Activity Questionnaire (MLTPAQ), for assessing trends in the Canadian population. Interviews with Canadian adults and comparisons of the Monitor against the Campbell's Survey of Well-Being…
Descriptors: Foreign Countries, Physical Activity Level, Research Methodology, Test Reliability
Peer reviewedDanielson, Carla Kmett; Phelps, Carolyn Roecker – Measurement and Evaluation in Counseling and Development, 2003
The Children's Self-Report Social Skills Scale (CS4), a 21-item instrument, was developed to measure children's perspectives on their own social skills. Test-retest reliability and internal consistency reliability of CS4 scores were .74 and .96, respectively. Principal component analysis revealed 3 reliable components: Social Rules, Likeability,…
Descriptors: Children, Interpersonal Competence, Psychometrics, Screening Tests
Peer reviewedWalter, Richard A.; Kapes, Jerome T. – Journal of Industrial Teacher Education, 2003
To identify a procedure for establishing cut scores for National Occupational Competency Testing Institute examinations in Pennsylvania, an expert panel assessed written and performance test items for minimally competent workers. Recommendations about the number, type, and training of judges used were made. (Contains 18 references.) (SK)
Descriptors: Cutting Scores, Interrater Reliability, Occupational Tests, Teacher Competency Testing
Peer reviewedPavitt, Charles; And Others – Communication Quarterly, 1995
Evaluates the stability of the Relational Dimensions Instrument (RDI) as a measure of people's general beliefs about the characteristics of the ideal marriage. Shows little stability for these judgments in college students who took the test twice, one year apart. Casts doubt on whether college students' implicit theories of marriage are tapped by…
Descriptors: College Students, Communication Research, Higher Education, Marriage
Peer reviewedDolin, Danielle J. – Communication Research Reports, 1995
Presents an 11-item alternative to the longer 25-item measure of teacher affinity seeking. Indicates that the measure is both reliable and valid for use with appropriate research questions. Presents broad guidelines for use. (SR)
Descriptors: Communication Research, Higher Education, Teacher Behavior, Teacher Student Relationship
Peer reviewedKindt, Merel; And Others – Journal of Experimental Child Psychology, 1997
Attempted to clarify whether fear is related to distorted cognitive processing of fear-related information. Administered card and single-trial formats of Stroop task to spider-fearing and control children. Found bias for spider words in both, regardless of format; further, processing biases assessed by the formats did not correlate, suggesting…
Descriptors: Bias, Children, Cognitive Measurement, Cognitive Processes
Peer reviewedBerry, Kenneth J.; Mielke, Paul W., Jr. – Educational and Psychological Measurement, 1997
Describes a FORTRAN software program that calculates the probability of an observed difference between agreement measures obtained from two independent sets of raters. An example illustrates the use of the DIFFER program in evaluating undergraduate essays. (Author/SLD)
Descriptors: Comparative Analysis, Computer Software, Evaluation Methods, Higher Education
Peer reviewedHumphreys, Lloyd G. – Applied Psychological Measurement, 1996
The reliability of a gain is determined by the reliabilities of the components, the correlation between them, and their standard deviations. Reliability is not inherently low, but the components of gains in many investigations make low reliability likely and require caution in the use of gain scores. (SLD)
Descriptors: Achievement Gains, Change, Correlation, Error of Measurement


