Publication Date
| In 2026 | 7 |
| Since 2025 | 690 |
| Since 2022 (last 5 years) | 3191 |
| Since 2017 (last 10 years) | 7432 |
| Since 2007 (last 20 years) | 15070 |
Descriptor
| Test Reliability | 15055 |
| Test Validity | 10290 |
| Reliability | 9763 |
| Foreign Countries | 7150 |
| Test Construction | 4828 |
| Validity | 4192 |
| Measures (Individuals) | 3880 |
| Factor Analysis | 3826 |
| Psychometrics | 3532 |
| Interrater Reliability | 3126 |
| Correlation | 3040 |
| More ▼ | |
Source
Author
Publication Type
Education Level
Audience
| Researchers | 709 |
| Practitioners | 451 |
| Teachers | 208 |
| Administrators | 122 |
| Policymakers | 66 |
| Counselors | 42 |
| Students | 38 |
| Parents | 11 |
| Community | 7 |
| Support Staff | 6 |
| Media Staff | 5 |
| More ▼ | |
Location
| Turkey | 1329 |
| Australia | 436 |
| Canada | 379 |
| China | 368 |
| United States | 271 |
| United Kingdom | 256 |
| Indonesia | 253 |
| Taiwan | 234 |
| Netherlands | 224 |
| Spain | 218 |
| California | 215 |
| More ▼ | |
Laws, Policies, & Programs
Assessments and Surveys
What Works Clearinghouse Rating
| Meets WWC Standards without Reservations | 8 |
| Meets WWC Standards with or without Reservations | 9 |
| Does not meet standards | 6 |
Gillmore, Gerald M. – 1979
It is argued in this paper that generalizability theory provides a uniquely useful framework for defining and quantifying the dependability of data for decision making. It does so by requiring careful specification of the conditions of measurement and the anticipated sources of variation in the results of the measurement procedure. A distinction…
Descriptors: Analysis of Variance, Criterion Referenced Tests, Decision Making, Educational Assessment
Peer reviewedBradley, Robert H.; Corwyn, Robert F.; Caldwell, Betty M.; Whiteside-Mansell, Leanne; Mink, Iris T. – Journal of Research on Adolescence, 2000
Describes the development of the Early Adolescent version of the Home Observation for Measurement of the Environment (EA-HOME) Inventory. Presents information on its usefulness with African Americans, Chinese Americans, European Americans, Mexican Americans, and Dominican Americans. Notes findings indicating high interobserver agreement, with…
Descriptors: Black Youth, Child Development, Chinese Americans, Cultural Differences
Peer reviewedHollenbeck, Keith; Tindal, Gerald; Almond, Patricia – Educational Assessment, 1999
Studied the amount of measurement error in a state's performance-based writing task as it relates to high-stakes decision reproducibility. Using 175 eighth-grade writing samples, the study finds moderate correlations between the two raters' scores, with significant differences for the rates for the handwritten, but not the typed, essays.(SLD)
Descriptors: Decision Making, Error of Measurement, Essay Tests, Grade 8
Wang, Tianyou – 1996
In this paper, formulas for computing the weights that maximize the reliability of a test with multiple parts are derived using a congeneric model. A direct derivation for the three-part test and case and a two-step derivation for the n-part case are presented, and results for these two approaches are shown to be consistent for the three-part…
Descriptors: Computation, Equations (Mathematics), Matrices, Performance Based Assessment
Green, Kathy E. – 1996
Person fit statistics are generated when item response theory is used to construct measures. While person fit statistics are well grounded in theory, their utility in aggregate reporting of survey data has not been demonstrated. This study evaluated effects on reliability and validity of including and excluding misfitting person response patterns,…
Descriptors: Adults, Attitude Measures, Item Response Theory, Mail Surveys
Shen, Linjun – 1997
Three aspects of the usual approach to assessing local item dependency, Yen's "Q" (H. Huynh, H. Michaels, and S. Ferrara, 1995), deserve further investigation. Pearson correlation coefficients do not distribute normally when the coefficients are large, and thus cannot quantify the dependency well. In the second place, the accuracy of…
Descriptors: Ability, Estimation (Mathematics), Item Response Theory, Reliability
Lee, Guemin; Frisbie, David A. – 1997
Previous studies have indicated that the reliability of test scores composed of testlets might be overestimated by conventional item-based reliability estimation methods (R. Thorndike, 1953; A. Anastasi, 1988; S. Sireci, D. Thissen, and H. Wainer, 1991; H. Wainer and D. Thissen, 1996). This study used generalizability theory to investigate the…
Descriptors: Estimation (Mathematics), Generalizability Theory, Reliability, Scores
Parshall, Cynthia G.; Kromrey, Jeffrey D.; Chason, Walter M. – 1996
The benefits of item response theory (IRT) will only accrue to a testing program to the extent that model assumptions are met. Obtaining accurate item parameter estimates is a critical first step. However, the sample sizes required for stable parameter estimation are often difficult to obtain in practice, particularly for the more complex models.…
Descriptors: Comparative Analysis, Estimation (Mathematics), Item Response Theory, Models
Guthrie, John T.; And Others – 1994
Noting that the amount of reading students do is related to their reading achievement, this booklet presents an instrument designed to measure the amount and breadth of students' reading in and out of school. The first part of the booklet discusses the Reading Activity Inventory (RAI) and how it differs from other reading activity measures, uses…
Descriptors: Elementary Education, Evaluation Methods, Reading Ability, Reading Achievement
Rodgers, Willard; Herzog, Regula – 1983
Using data collected through telephone interviews with a national sample of adults, this study searched for evidence as to whether interviewers have stronger effects on the responses given to a wide range of questions by older people than on the responses of younger people. Responses to 30 items for which significant interviewer effects had…
Descriptors: Adults, Age Differences, Interviews, Older Adults
Occupational Outlook Quarterly, 1975
Descriptors: Employment Patterns, Employment Projections, Evaluation, Federal Government
Peer reviewedFulton, Robert T.; And Others – Journal of Speech and Hearing Disorders, 1975
Evaluated with 12 children (9- to 25-months-old) were the efficacy and reliability of auditory stimulus-response control training and assessment procedures. (Author/LS)
Descriptors: Auditory Tests, Exceptional Child Research, Hearing Impairments, Infants
Peer reviewedHay, Nancy M.; Stewart, Norman R. – Journal of Counseling Psychology, 1974
This study determined internal consistency and test-retest reliability coefficients for the Willoughby Personality Schedule, currently used as an outcome measure in research and in clinical practice. The Hoyt analysis of variance yielded an internal consistency reliability coefficient of .90 on the first testing. The test-retest reliability…
Descriptors: Anxiety, College Students, Evaluation Methods, Personality Measures
Peer reviewedBalyeat, Ralph; Norman, Douglas – Reading Teacher, 1975
Research indicates that a special version of the cloze procedure is a reliable test of reading comprehension. (RB)
Descriptors: Cloze Procedure, Elementary Education, Reading Comprehension, Reading Research
Attali, Yigal – ETS Research Report Series, 2004
Contrary to common belief, reliability estimates of number-right multiple-choice tests are not inflated by speededness. Because examinees guess on questions when they run out of time, the responses to these questions show less consistency with the responses of other questions, and the reliability of the test will be decreased. The surprising…
Descriptors: Multiple Choice Tests, Timed Tests, Test Reliability, Guessing (Tests)


