Publication Date
| In 2026 | 0 |
| Since 2025 | 8 |
| Since 2022 (last 5 years) | 36 |
| Since 2017 (last 10 years) | 115 |
| Since 2007 (last 20 years) | 378 |
Descriptor
| Test Theory | 1166 |
| Test Items | 262 |
| Test Reliability | 252 |
| Test Construction | 246 |
| Test Validity | 245 |
| Psychometrics | 183 |
| Scores | 176 |
| Item Response Theory | 168 |
| Foreign Countries | 160 |
| Item Analysis | 141 |
| Statistical Analysis | 134 |
| More ▼ | |
Source
Author
Publication Type
Education Level
Location
| United States | 17 |
| United Kingdom (England) | 15 |
| Canada | 14 |
| Australia | 13 |
| Turkey | 12 |
| Sweden | 8 |
| United Kingdom | 8 |
| Netherlands | 7 |
| Texas | 7 |
| New York | 6 |
| Taiwan | 6 |
| More ▼ | |
Laws, Policies, & Programs
| No Child Left Behind Act 2001 | 4 |
| Elementary and Secondary… | 3 |
| Individuals with Disabilities… | 3 |
Assessments and Surveys
What Works Clearinghouse Rating
Peer reviewedAlliger, George M.; Williams, Kevin J. – Educational and Psychological Measurement, 1989
The interrelationships among halo and leniency rating errors were examined using simulated rating data. As leniency increased, halo decreased when measured by dimension intercorrelations but increased when measured by standard deviations across dimensions. Implications of these results for the use of the various measures are discussed. (SLD)
Descriptors: Cognitive Measurement, Estimation (Mathematics), Evaluation Criteria, Performance
Kamil, Michael S.; Tierney, Robert J. – Illinois Schools Journal, 1988
In conjunction with testing mandates, some states have developed new measures intended to reflect changes in thinking about reading. Discusses, in dialogue form, whether these new measures support educational improvement or limit them. (BJV)
Descriptors: Educational Assessment, Educational Improvement, Reading Tests, Scores
Peer reviewedHaladyna, Thomas M.; Downing, Steven M. – Applied Measurement in Education, 1989
A taxonomy of 43 rules for writing multiple-choice test items is presented, based on a consensus of 46 textbooks. These guidelines are presented as complete and authoritative, with solid consensus apparent for 33 of the rules. Four rules lack consensus, and 5 rules were cited fewer than 10 times. (SLD)
Descriptors: Classification, Interrater Reliability, Multiple Choice Tests, Objective Tests
Peer reviewedSegall, Daniel O. – Psychometrika, 1994
An asymptotic expression for the reliability of a linearly equated test is developed using normal theory. Reliability is expressed as the product of test reliability before equating and an adjustment term that is a function of the sample sizes used to estimate the linear equating transformation. The approach is illustrated. (SLD)
Descriptors: Equated Scores, Error of Measurement, Estimation (Mathematics), Sample Size
Peer reviewedMislevy, Robert J. – Educational and Psychological Measurement, 1993
Relationships between Bayesian ability estimates and the parameters of a normal population distribution are derived in the context of classical test theory. Formulas are presented for practical work with Bayesian ability estimates, and a numerical illustration is provided. (SLD)
Descriptors: Ability, Bayesian Statistics, Equations (Mathematics), Estimation (Mathematics)
Peer reviewedShohamy, Elana – Annual Review of Applied Linguistics, 1995
Reviews recent trends in performance testing, focusing on different definitions of performance testing; the extent to which performance tests have drawn upon the theoretical discussions of competence and performance; research on performance tests; and future developmental and research questions. (66 references) (MDM)
Descriptors: Definitions, Evaluation Methods, Language Proficiency, Language Tests
Peer reviewedStone, Clement A. – Educational Measurement: Issues and Practice, 1992
TESTAT is a supplementary module for the popular SYSTAT statistical package for the personal computer. The program performs test analyses based on classical test theory and item response theory. Limitations and advantages are discussed. (SLD)
Descriptors: Computer Assisted Testing, Computer Software Evaluation, Error of Measurement, Item Response Theory
Peer reviewedArmstrong, Ronald D.; Jones, Douglas H. – Applied Psychological Measurement, 1992
Polynomial algorithms are presented that are used to solve selected problems in test theory, and computational results from sample problems with several hundred decision variables are provided that demonstrate the benefits of these algorithms. The algorithms are based on optimization theory in networks (graphs). (SLD)
Descriptors: Algorithms, Decision Making, Equations (Mathematics), Mathematical Models
Peer reviewedDavidson, Fred – System, 2000
Statistical analysis tools in language testing are described, chiefly classical test theory and item response theory. Computer software for statistical analysis is briefly reviewed and divided into three tiers: commonly available; statistical packages; and specialty software. (Author/VWL)
Descriptors: Computer Software, Language Tests, Second Language Learning, Statistical Analysis
Peer reviewedBrown, Roger – International Journal of Computer Algebra in Mathematics Education, 2001
Reviews what is happening in examination systems that have begun to allow the use of Computer Algebra Systems (CAS) in externally set 'high stakes' assessment regimes. Discusses possible options for the future with the intention of developing a dialogue on how assessment with a CAS can help develop the mathematical literacy of students. (Author/MM)
Descriptors: Calculators, Computer Uses in Education, Evaluation, High Stakes Tests
Liu, Kimy; Sundstrom-Hebert, Krystal; Ketterlin-Geller, Leanne R.; Tindal, Gerald – Behavioral Research and Teaching, 2008
The purpose of this study was to document the instrument development of maze measures for grades 3-8. Each maze passage contained twelve omitted words that students filled in by choosing the best-fit word from among the provided options. In this technical report, we describe the process of creating, reviewing, and pilot testing the maze measures.…
Descriptors: Test Construction, Cloze Procedure, Multiple Choice Tests, Reading Tests
Sireci, Stephen G. – 1995
The purpose of this paper is to clarify the seemingly discrepant views of test theorists and test developers about terminology related to the evaluation of test content. The origin and evolution of the concept of content validity are traced, and the concept is reformulated in a way that emphasizes the notion that content domain definition,…
Descriptors: Construct Validity, Content Validity, Definitions, Item Analysis
Anderson, Margaret D. – 1996
An effective test and measurement course in psychology should expose students to a variety of available psychological tests, as well as to the mechanics of test construction and evaluation. In a test and measurement course at the State University of New York's College at Cortland, the course is divided into two components with an overlaying group…
Descriptors: Cooperative Learning, Group Activities, Higher Education, Psychological Testing
Linacre, John M.; Wright, Benjamin D. – 1987
The Mantel-Haenszel (MH) procedure attempts to identify and quantify differential item performance (item bias). This paper summarizes the MH statistics, and identifies the parameters they estimate. An equivalent procedure based on the Rasch model is described. The theoretical properties of the two approaches are compared and shown to require the…
Descriptors: Algorithms, Estimation (Mathematics), Item Analysis, Measurement Techniques
Norris, Stephen P. – 1989
This report describes a methodology for using verbal reports of thinking to develop and validate multiple-choice tests of critical thinking. These verbal reports of individuals' thinking on draft items of multiple-choice critical thinking tests can be used systematically to provide evidence of the thinking processes elicited by such tests, and in…
Descriptors: Critical Thinking, Educational Research, Multiple Choice Tests, Protocol Analysis


