Publication Date
| In 2026 | 0 |
| Since 2025 | 8 |
| Since 2022 (last 5 years) | 36 |
| Since 2017 (last 10 years) | 115 |
| Since 2007 (last 20 years) | 378 |
Descriptor
| Test Theory | 1166 |
| Test Items | 262 |
| Test Reliability | 252 |
| Test Construction | 246 |
| Test Validity | 245 |
| Psychometrics | 183 |
| Scores | 176 |
| Item Response Theory | 168 |
| Foreign Countries | 160 |
| Item Analysis | 141 |
| Statistical Analysis | 134 |
| More ▼ | |
Source
Author
Publication Type
Education Level
Location
| United States | 17 |
| United Kingdom (England) | 15 |
| Canada | 14 |
| Australia | 13 |
| Turkey | 12 |
| Sweden | 8 |
| United Kingdom | 8 |
| Netherlands | 7 |
| Texas | 7 |
| New York | 6 |
| Taiwan | 6 |
| More ▼ | |
Laws, Policies, & Programs
| No Child Left Behind Act 2001 | 4 |
| Elementary and Secondary… | 3 |
| Individuals with Disabilities… | 3 |
Assessments and Surveys
What Works Clearinghouse Rating
Clark, Lee Anna – Psychometrika, 2006
Borsboom (2006) attacks psychologists for failing to incorporate psychometric advances in their work, discusses factors that contribute to this regrettable situation, and offers suggestions for ameliorating it. This commentary applauds Borsboom for calling the field to task on this issue and notes additional problems in the field regarding…
Descriptors: Psychometrics, Psychological Studies, Construct Validity, Measurement Techniques
Brown, James Dean; Ross, Jacqueline A. – 1993
This study investigates the Test of English as a Foreign Language (TOEFL), in particular the relative contributions to score dependability (analogous to classical theory reliability) of various numbers of items and subtests as well as the decision dependability at different cut points. Research questions that apply to the overall TOEFL battery and…
Descriptors: English (Second Language), Language Tests, Statistical Analysis, Test Reliability
Stone, Kathy Kees; And Others – 1983
Looking beyond the overall effectiveness of sensory stimulation, this study aimed to identify specific aspects of infant behavior most responsive to early stimulation. Subjects were 65 premature infants with a birth weight of less than 5 pounds, 8 ounces and a gestational age under 37 weeks. Experimental group members had completed a multimodal…
Descriptors: Comparative Analysis, Discriminant Analysis, Infant Behavior, Premature Infants
Wainer, Howard – 1982
This paper is the transcript of a talk given to those who use test information but who have little technical background in test theory. The concepts of modern test theory are compared with traditional test theory, as well as a probable future test theory. The explanations given are couched within an extended metaphor that allows a full description…
Descriptors: Difficulty Level, Latent Trait Theory, Metaphors, Test Items
Andrich, David – 1984
Both the attenuation paradox of traditional test theory and the assumption of local independence in person-item response theory have caused problems in interpretation. This paper demonstrates that the two are related concepts, and, through this demonstration, both are clarified. It is demonstrated that the breakdown of local independence leads to…
Descriptors: Latent Trait Theory, Test Interpretation, Test Items, Test Reliability
Budescu, David V. – 1979
This paper outlines a technique for differentially weighting options of a multiple choice test in a fashion that maximizes the item predictive validity. The rule can be applied with different number of categories and the "optimal" number of categories can be determined by significance tests and/or through the R2 criterion. Our theoretical analysis…
Descriptors: Multiple Choice Tests, Predictive Validity, Scoring Formulas, Test Items
Peer reviewedAlbanese, Mark; Pfohl, Bruce – Evaluation and the Health Professions, 1988
A procedure derived from classical test theory analyzes course grades and report results to assess third-year clerkships at a midwestern medical school. The procedure is sensitive to a large range of characteristics of the courses and is a promising supplement to student course evaluations in studying curriculum change. (SLD)
Descriptors: Course Content, Course Evaluation, Curriculum Development, Curriculum Evaluation
Peer reviewedO'Brien, Michael L. – Studies in Educational Evaluation, 1986
A monograph issue on the development and use of a prescriptive measurement method is introduced. Given such a measurement system, it is possible to investigate both level and pattern of a student's performance, and to diagnose specific gaps in learning. (LMO)
Descriptors: Academic Achievement, Educational Diagnosis, Educational Testing, Elementary Secondary Education
Peer reviewedJarjoura, David – Journal of Educational Statistics, 1985
Issues regarding tolerance and confidence intervals are discussed within the context of educational measurement, and conceptual distinctions are drawn between these two types of intervals. Points are raised about the advantages of tolerance intervals when the focus is on a particular observed score rather than a particular examinee. (Author/BW)
Descriptors: Comparative Analysis, Error of Measurement, Mathematical Models, Test Interpretation
Peer reviewedWilliams, Richard H.; Zimmerman, Donald W. – Journal of Experimental Education, 1984
This paper provides a list of 10 salient features of the standard error of measurement, contrasting it to the reliability coefficient. It is concluded that the standard error of measurement should be regarded as a primary characteristic of a mental test. (Author/DWH)
Descriptors: Educational Testing, Error of Measurement, Evaluation Methods, Psychological Testing
Peer reviewedRost, Jurgen – Psychometrika, 1985
A latent class model for rating data is presented which provides an alternative to the latent trait approach of analyzing test data. It is the analog of Andrich's binomial Rasch model for Lazarsfeld's latent class analysis (LCA). Response probabilities for rating categories follow a binomial distribution and depend on class-specific item…
Descriptors: Item Analysis, Latent Trait Theory, Mathematical Models, Rating Scales
Peer reviewedSpencer, Bruce D. – Journal of Educational Measurement, 1983
Because test scores are ordinal not cordinal attributes, the average test score often is a misleading way to summarize the scores of a group of individuals. Similarly, correlation coefficients may be misleading summary measures of association between test scores. Proper, readily interpretable, summary statistics are developed from a theory of…
Descriptors: Correlation, Measurement Techniques, Scores, Statistical Analysis
Rogosa, David – 2000
In the reporting of individual student results from standardized tests in educational assessments, the percentile rank of the individual student is a major numerical indicator. For example, in the 1998 and 1999 California Standardized Testing and Reporting (STAR) program using the Stanford Achievement Test Series, Ninth Edition, Form T (Stanford…
Descriptors: Comparative Analysis, Elementary Secondary Education, Standardized Tests, Tables (Data)
Peer reviewedAdler, Nurit; Guttman, Ruth – Educational and Psychological Measurement, 1982
Thirteen ability tests were administered as defined within a mapping sentence containing four content facets: rule type, expression mode, language of communication and dimensionality of portrayed object. Smallest Space Analysis of intercorrelations among test scores showed the radex structure of the two-dimensional space conformed to the…
Descriptors: Content Analysis, Factor Structure, Intelligence Tests, Scores
Jarjoura, David; Brennan, Robert L. – New Directions for Testing and Measurement, 1983
Multivariate generalizability techniques are used to bridge the gap between psychometric constraints and the tables of specifications needed in test development. Techniques are illustrated with results from the American College Testing Assessment Program. (Author/PN)
Descriptors: Data Analysis, Mathematical Models, Multivariate Analysis, Test Construction

Direct link
