Publication Date
In 2025 | 3 |
Since 2024 | 12 |
Since 2021 (last 5 years) | 41 |
Since 2016 (last 10 years) | 126 |
Since 2006 (last 20 years) | 395 |
Descriptor
Test Theory | 1161 |
Test Items | 261 |
Test Reliability | 252 |
Test Construction | 245 |
Test Validity | 245 |
Psychometrics | 181 |
Scores | 176 |
Item Response Theory | 165 |
Foreign Countries | 159 |
Item Analysis | 141 |
Statistical Analysis | 134 |
More ▼ |
Source
Author
Publication Type
Education Level
Location
United States | 17 |
United Kingdom (England) | 15 |
Canada | 14 |
Australia | 13 |
Turkey | 12 |
Sweden | 8 |
United Kingdom | 8 |
Netherlands | 7 |
Texas | 7 |
New York | 6 |
Taiwan | 6 |
More ▼ |
Laws, Policies, & Programs
No Child Left Behind Act 2001 | 4 |
Elementary and Secondary… | 3 |
Individuals with Disabilities… | 3 |
Assessments and Surveys
What Works Clearinghouse Rating

Ng, K. T. – Educational and Psychological Measurement, 1974
This paper is aimed at demonstrating that Charles Spearman postulated neither a platonic true-error distinction nor a requirement for constant true scores under repeated measurement. (Author/RC)
Descriptors: Career Development, Correlation, Models, Test Reliability

Kolstad, Rosemarie K.; And Others – Journal of Research and Development in Education, 1985
Multiple choice questions that could logically provide two or more choices block the expression of judgment, thereby suppressing measurement of learning and failing to provide feedback to students and teachers. This study compares the effects of content identical multiple choice and multiple true false items on students' decision. (MT)
Descriptors: Evaluation Methods, Higher Education, Knowledge Level, Test Format

Frary, Robert B.; Zimmerman, Donald W. – Educational and Psychological Measurement, 1984
The correlation between bias components of test scores and unbiased observed scores is shown to be an effective predictor of changes in reliability and validity resulting from elimination of bias. Plausible assumptions about value of correlation and size of related variance components indicate that reducation in reliability and validity is a…
Descriptors: Correlation, Scores, Test Bias, Test Reliability
Mislevy, Robert J.; And Others – 1990
The models of standard test theory, having evolved under a trait-oriented psychology, do not reflect the knowledge structures and the problem-solving strategies now seen as central to understanding performance and learning. In some applications, however, key qualitative distinctions among persons as to structures and strategies can be expressed…
Descriptors: Learning Strategies, Models, Problem Solving, Spatial Ability
Mislevy, Robert J.; Wilson, Mark R.; Ercikan, Kadriye; Chudowsky, Naomi – 2002
In educational assessment, what students say, do, and sometimes make is observed, and assessors attempt to infer what students know, can do, or have accomplished more generally. Some links in the chain of inference depend on statistical models and probability-based reasoning, and it is with these links that terms such as validity, reliability, and…
Descriptors: Data Analysis, Data Collection, Educational Assessment, Inferences
Braun, Henry I.; Mislevy, Robert J. – US Department of Education, 2004
Psychologist Andrea diSessa coined the term "phenomenological primitives", or p-prims, to talk about nonexperts' reasoning about physical situations. P-prims are primitive in the sense that they stand without significant explanatory substructure or explanation. Examples are "Heavy objects fall faster than light objects" and "Continuing force is…
Descriptors: Test Theory, Testing, Evaluation Methods, Scores

Cudeck, Robert – Journal of Educational Measurement, 1980
Methods for evaluating the consistency of responses to test items were compared. When a researcher is unwilling to make the assumptions of classical test theory, has only a small number of items, or is in a tailored testing context, Cliff's dominance indices may be useful. (Author/CTM)
Descriptors: Error Patterns, Item Analysis, Test Items, Test Reliability

Feldt, Leonard S. – Applied Measurement in Education, 1997
It has often been asserted that the reliability of a measure places an upper limit on its validity. This article demonstrates in theory that validity can rise when reliability declines, even when validity evidence is a correlation with an acceptable criterion. Whether empirical examples can actually be found is an open question. (SLD)
Descriptors: Correlation, Criteria, Reliability, Test Construction

Williams, Richard H.; Zimmerman, Donald W. – Applied Psychological Measurement, 1996
Modified equations for the validity and reliability of difference scores that describe applied testing situations are examined. This examination reveals that simple gain scores can be more useful in research than has commonly been believed. Simple gain scores are neither inherently unreliable nor lack predictive validity. (SLD)
Descriptors: Achievement Gains, Change, Equations (Mathematics), Prediction

Siegert, Richard J.; And Others – Multivariate Behavioral Research, 1988
A study concluding that the Wechsler Adult Intelligence Scale (Revised) (WAIS-R) has three clear factors in its structure is critiqued. An alternative factor comparison technique, FACTOREP, is used with identical data. It is demonstrated that the WAIS-R has only two strong factors--verbal comprehension and perceptual organization. (TJH)
Descriptors: Factor Analysis, Factor Structure, Intelligence Tests, Item Analysis

Satorra, Albert – Psychometrika, 1989
Within covariance structural analysis, a unified approach to asymptotic theory of alternative test criteria for testing parametric restrictions is provided. More general statistics for addressing the case where the discrepancy function is not asymptotically optimal, and issues concerning power analysis and the asymptotic theory of testing-related…
Descriptors: Chi Square, Equations (Mathematics), Matrices, Psychometrics

Chapelle, Carol A. – Annual Review of Applied Linguistics, 1999
Provides a history of validation in language testing, discusses current approaches to validation in language testing (hypothesis about testing outcomes, relevant evidence for testing the hypothesis, developing a validity argument), and gives an overview of current challenges in language-test validation (defining the language construct to be…
Descriptors: Language Tests, Second Language Learning, Test Theory, Test Validity

Maraun, Michael D.; Jackson, Jeremy S. H.; Luccock, Craig R.; Belfer, Sharon E.; Chrisjohn, Roland D. – Educational and Psychological Measurement, 1998
Responses of 903 Canadian college students to the Self-Monitoring Scale (M. Snyder, 1974) and artificial data are used to illustrate the importance of pairing test theoretical structure with quantitative characteristics, including conditional association (CA) and strong positive orthant dependence (SPOD). Tests for CA and SPOD are reviewed. (SLD)
Descriptors: College Students, Foreign Countries, Higher Education, Test Construction

Muraki, Eiji; Hombo, Catherine M.; Lee, Yong-Won – Applied Psychological Measurement, 2000
Presents an overview of linking methods applied to performance assessment and discusses major issues and recent developments in linking performance assessments. Compares three common linking designs and two major linking methodologies (classical and item response theory (IRT)). Describes two classical equating methods and several IRT equating…
Descriptors: Equated Scores, Item Response Theory, Performance Based Assessment, Test Theory

Stanton, Harrison C.; Reynolds, Cecil R. – School Psychology Quarterly, 2000
Argues that the apparent lack of empirical support for the practice of profile analysis has stemmed in part from the use of statistical techniques that have neglected to explore the perspective of the clinician. Explores Configural Frequency Analysis, a statistical technique focusing on the relationships among groups of participants as the unit of…
Descriptors: Children, Profiles, Psychological Testing, School Psychology