Publication Date
| In 2026 | 3 |
| Since 2025 | 656 |
| Since 2022 (last 5 years) | 3157 |
| Since 2017 (last 10 years) | 7398 |
| Since 2007 (last 20 years) | 15036 |
Descriptor
| Test Reliability | 15028 |
| Test Validity | 10265 |
| Reliability | 9757 |
| Foreign Countries | 7137 |
| Test Construction | 4821 |
| Validity | 4191 |
| Measures (Individuals) | 3876 |
| Factor Analysis | 3822 |
| Psychometrics | 3520 |
| Interrater Reliability | 3124 |
| Correlation | 3039 |
| More ▼ | |
Source
Author
Publication Type
Education Level
Audience
| Researchers | 709 |
| Practitioners | 451 |
| Teachers | 208 |
| Administrators | 122 |
| Policymakers | 66 |
| Counselors | 42 |
| Students | 38 |
| Parents | 11 |
| Community | 7 |
| Support Staff | 6 |
| Media Staff | 5 |
| More ▼ | |
Location
| Turkey | 1326 |
| Australia | 436 |
| Canada | 379 |
| China | 368 |
| United States | 271 |
| United Kingdom | 256 |
| Indonesia | 251 |
| Taiwan | 234 |
| Netherlands | 223 |
| Spain | 216 |
| California | 214 |
| More ▼ | |
Laws, Policies, & Programs
Assessments and Surveys
What Works Clearinghouse Rating
| Meets WWC Standards without Reservations | 8 |
| Meets WWC Standards with or without Reservations | 9 |
| Does not meet standards | 6 |
Peer reviewedPhilip, Alistair E. – British Journal of Psychology, 1970
Descriptors: Analysis of Variance, Anxiety, Test Reliability
Peer reviewedPepin, Arthur C. – Clearing House, 1971
Descriptors: Educational Testing, Intelligence Tests, Test Reliability
Peer reviewedMandel, Robert; McLeod, Philip – Exceptional Children, 1970
Descriptors: Intelligence Tests, Socioeconomic Status, Test Reliability
Kroll, Water – Res Quart AAHPER, 1970
Descriptors: Error Patterns, Muscular Strength, Test Reliability
Peer reviewedWilcox, Rand R. – Educational and Psychological Measurement, 1981
This paper describes and compares procedures for estimating the reliability of proficiency tests that are scored with latent structure models. Results suggest that the predictive estimate is the most accurate of the procedures. (Author/BW)
Descriptors: Criterion Referenced Tests, Scoring, Test Reliability
Peer reviewedUebersax, John S. – Educational and Psychological Measurement, 1982
A more general method for calculating the Kappa measure of nominal rating agreement among multiple raters is presented. It can be used across a broad range of rating designs, including those in which raters vary with respect to their base rates and how many subjects they rate in common. (Author/BW)
Descriptors: Mathematical Formulas, Statistical Significance, Test Reliability
Peer reviewedWoodward, J. Arthur; Bentler, P. M. – Psychometrika, 1979
Expressions involving optimal sign vectors are derived so as to yield two new applications. First, coefficient alpha for the sign-weighted composite is maximized in analogy to Lord's scale-independent solution with differential weights. Second, optimal sign vectors are used to define two groups of objects that are maximally distinct. (Author/CTM)
Descriptors: Classification, Cluster Analysis, Reliability, Statistical Analysis
Peer reviewedBergan, John R. – Journal of Educational Measurement, 1980
A coefficient of inter-rater agreement is presented which describes the magnitude of observer agreement as the probability estimated under a quasi-independence model that responses from different observers will be in agreement. (Author/JKS)
Descriptors: Measurement Techniques, Observation, Rating Scales, Reliability
Peer reviewedWillson, Victor L. – Educational and Psychological Measurement, 1980
Guilford's average interrater correlation coefficient is shown to be related to the Friedman Rank Sum statistic. Under the null hypothesis of zero correlation, the resultant distribution is known and the hypothesis can be tested. Large sample and tied score cases are also considered. An example from Guilford (1954) is presented. (Author)
Descriptors: Correlation, Hypothesis Testing, Mathematical Formulas, Reliability
Peer reviewedKraemer, Helena Chmura – Journal of Educational Statistics, 1980
The robustness of hypothesis tests for the correlation coefficient under varying conditions is discussed. The effects of violations of the assumptions of linearity, homoscedasticity, and kurtosis are examined. (JKS)
Descriptors: Correlation, Hypothesis Testing, Reliability, Statistical Analysis
Brandt, D. Scott – Computers in Libraries, 1996
Evaluation of information found on the Internet requires the same assessment of reliability, credibility, perspective, purpose and author credentials as required with print materials. Things to check include whether the source is from a moderated or unmoderated list or FTP (file transfer protocol) site; directories for affiliation and biographical…
Descriptors: Evaluation Criteria, Information Sources, Internet, Reliability
Peer reviewedBarnes, Laura L. B.; Harp, Diane; Jung, Woo Sik – Educational and Psychological Measurement, 2002
Conducted a reliability generalization study for the State-Trait Anxiety Inventory (C. Spielberger, 1983) by reviewing and classifying 816 research articles. Average reliability coefficients were acceptable for both internal consistency and test-retest reliability, but variation was present among the estimates. Other differences are discussed.…
Descriptors: Adults, Anxiety, Generalization, Meta Analysis
Peer reviewedFeldt, Leonard S. – Educational and Psychological Measurement, 2003
Develops formulas to cope with the situation in which the reliability of test scores must be approximated even though no examinee has taken the complete instrument. Develops different estimators for part tests that are judged to be classically parallel, tau-equivalent, or congeneric. Proposes standards for differentiating among these three models.…
Descriptors: Estimation (Mathematics), Reliability, Scores, Test Results
Peer reviewedLi, Heng – Psychometrika, 1997
A formally simple expression for the maximal reliability of a linear composite is provided. Its theoretical implications and its relation to existing results for reliability are discussed. (Author/SLD)
Descriptors: Reliability, Test Items, Theory Practice Relationship
Peer reviewedMyford, Carol M. – Applied Measurement in Education, 2002
Studied the use of descriptive graphic rating scales by 11 raters to evaluate students' work, exploring different design features. Used a Rasch-model based rating scale analysis to determine that all the continuous scales could be considered to have at least five points, and that defined midpoints did not result in higher student separation…
Descriptors: Evaluators, Rating Scales, Reliability, Test Construction


