Publication Date
In 2025 | 3 |
Since 2024 | 12 |
Since 2021 (last 5 years) | 41 |
Since 2016 (last 10 years) | 126 |
Since 2006 (last 20 years) | 395 |
Descriptor
Test Theory | 1161 |
Test Items | 261 |
Test Reliability | 252 |
Test Construction | 245 |
Test Validity | 245 |
Psychometrics | 181 |
Scores | 176 |
Item Response Theory | 165 |
Foreign Countries | 159 |
Item Analysis | 141 |
Statistical Analysis | 134 |
More ▼ |
Source
Author
Publication Type
Education Level
Location
United States | 17 |
United Kingdom (England) | 15 |
Canada | 14 |
Australia | 13 |
Turkey | 12 |
Sweden | 8 |
United Kingdom | 8 |
Netherlands | 7 |
Texas | 7 |
New York | 6 |
Taiwan | 6 |
More ▼ |
Laws, Policies, & Programs
No Child Left Behind Act 2001 | 4 |
Elementary and Secondary… | 3 |
Individuals with Disabilities… | 3 |
Assessments and Surveys
What Works Clearinghouse Rating

Bieliauskas, Vytautas J.; Farragher, John – Journal of Clinical Psychology, 1983
Administered the House-Tree-Person test to male college students (N=24) to examine the effects of varying the size of the drawing form on the scores. Results suggested that use of the drawing sheet did not have a significant influence upon the quantitative aspects of the drawing. (LLL)
Descriptors: College Students, Higher Education, Intelligence Tests, Males

Masters, Geofferey N. – Educational and Psychological Measurement, 1984
DICOT, a computer program for the Rasch analysis of classroom tests, is described. Results are presented in a self-explanatory form. Person ability and item difficulty estimates are expressed in a familiar metric. Person and item fit statistics provide a diagnosis of individual children and identification of problematic items. (Author/DWH)
Descriptors: Classroom Techniques, Foreign Countries, Item Analysis, Latent Trait Theory

Stegelmann, Werner – Psychometrika, 1983
The Rasch model is generalized to a multicomponent model, so that observations of component events are not needed to apply the model. It is shown that the generalized model maintains the property of specific objectivity of the Rasch model. An application to a mathematics test is provided. (Author/JKS)
Descriptors: Estimation (Mathematics), Item Analysis, Latent Trait Theory, Mathematical Models

Williams, Richard H.; Zimmerman, Donald W. – Journal of Experimental Education, 1982
The reliability of simple difference scores is greater than, less than, or equal to that of residualized difference scores, depending on whether the correlation between pretest and posttest scores is greater than, less than, or equal to the ratio of the standard deviations of pretest and posttest scores. (Author)
Descriptors: Achievement Gains, Comparative Analysis, Correlation, Pretests Posttests

Gravett, Sarah – South African Journal of Higher Education, 1996
Argues that student assessment plays a crucial role in the academic life of college students, and assessment arrangements embody the purposes of higher education. Reviews research suggesting learners' perceptions of course testing procedures is the single most important influence on learning. Outlines six guiding principles of test development to…
Descriptors: College Instruction, Educational Objectives, Higher Education, Student Evaluation

Mislevy, Robert J. – Journal of Educational Measurement, 1996
Developments in cognitive and developmental psychology have broadened the inferences researchers want to make about students' learning and the nature and acquisition of knowledge. The principles of inference that led to standard test theory can support inference in the broader context of the cognitive revolution. (SLD)
Descriptors: Cognitive Psychology, Developmental Psychology, Educational Assessment, Educational Research

Klinger, Don A.; Rogers, W. Todd – Alberta Journal of Educational Research, 2003
The estimation accuracy of procedures based on classical test score theory and item response theory (generalized partial credit model) were compared for examinations consisting of multiple-choice and extended-response items. Analysis of British Columbia Scholarship Examination results found an error rate of about 10 percent for both methods, with…
Descriptors: Academic Achievement, Educational Testing, Foreign Countries, High Stakes Tests

Andrich, David – Psychometrika, 1995
This book discusses adapting pencil-and-paper tests to computerized testing. Mention is made of models for graded responses to items and of possibilities beyond pencil-and-paper-tests, but the book is essentially about dichotomously scored test items. Contrasts between item response theory and classical test theory are described. (SLD)
Descriptors: Adaptive Testing, Computer Assisted Testing, Item Response Theory, Scores

Little, Roderick J. A.; Rubin, Donald B. – Journal of Educational and Behavioral Statistics, 1994
Equating a new standard test to an old reference test is considered when samples for equating are not randomly selected from the target population of test takers, identifying two problems from equating from biased samples. An empirical example with data from the Armed Services Vocational Aptitude Battery illustrates the approach. (SLD)
Descriptors: Equated Scores, Military Personnel, Sampling, Statistical Analysis

Reuterberg, Sven-Eric; Gustafsson, Jan-Eric – Educational and Psychological Measurement, 1992
The use of confirmatory factor analysis by the LISREL program is demonstrated as an assumption-testing method when computing reliability coefficients under different model assumptions. Results indicate that reliability estimates are robust against departure from the assumption of parallelism of test items. (SLD)
Descriptors: Equations (Mathematics), Estimation (Mathematics), Mathematical Models, Robustness (Statistics)

Brown, James Dean – Language Testing, 1999
Explored the relative contributions to Test of English as a Foreign Language (TOEFL) score dependability of various numbers of persons, items, subtests, languages, and their various interactions. Sampled 15,000 test takers, 1000 each from 15 different language backgrounds. (Author/VWL)
Descriptors: English (Second Language), Language Tests, Second Language Learning, Student Characteristics

McDermott, Paul A.; Glutting, Joseph J. – School Psychology Review, 1997
Reports on empirical studies that assessed continuing claims for utility of subtest analysis. Hierarchical regression and discriminate models were used to determine maximum potential of ability subtests to explain variation in academic achievement, stylistic classroom learning, and test-session behavior. Ipsative subtest scores provide no…
Descriptors: Ability Identification, Academic Ability, Academic Achievement, Classroom Environment

Ferrando, Pere J.; Lorenzo, Urbano – Educational and Psychological Measurement, 1998
A program for obtaining ability estimates and their standard errors under a variety of psychometric models is documented. The general models considered are (1) classical test theory; (2) item factor analysis for continuous censored responses; and (3) unidimensional and multidimensional item response theory graded response models. (SLD)
Descriptors: Ability, Error of Measurement, Estimation (Mathematics), Factor Analysis

Burton, Richard F. – Assessment & Evaluation in Higher Education, 2001
Describes four measures of test unreliability that quantify effects of question selection and guessing, both separately and together--three chosen for immediacy and one for greater mathematical elegance. Quantifies their dependence on test length and number of answer options per question. Concludes that many multiple choice tests are unreliable…
Descriptors: Guessing (Tests), Mathematical Models, Multiple Choice Tests, Objective Tests

Burry-Stock, Judith A.; And Others – Educational and Psychological Measurement, 1996
It is argued that interrater agreement is a psychometric property which is theoretically different from classic reliability. Formulas are presented to illustrate a set of algebraically equivalent rater agreement indices that are intended to provide educational and psychological researchers with a practical way to establish a measure of rater…
Descriptors: Algebra, Educational Research, Interrater Reliability, Measures (Individuals)