Publication Date
| In 2026 | 0 |
| Since 2025 | 8 |
| Since 2022 (last 5 years) | 36 |
| Since 2017 (last 10 years) | 115 |
| Since 2007 (last 20 years) | 378 |
Descriptor
| Test Theory | 1166 |
| Test Items | 262 |
| Test Reliability | 252 |
| Test Construction | 246 |
| Test Validity | 245 |
| Psychometrics | 183 |
| Scores | 176 |
| Item Response Theory | 168 |
| Foreign Countries | 160 |
| Item Analysis | 141 |
| Statistical Analysis | 134 |
| More ▼ | |
Source
Author
Publication Type
Education Level
Location
| United States | 17 |
| United Kingdom (England) | 15 |
| Canada | 14 |
| Australia | 13 |
| Turkey | 12 |
| Sweden | 8 |
| United Kingdom | 8 |
| Netherlands | 7 |
| Texas | 7 |
| New York | 6 |
| Taiwan | 6 |
| More ▼ | |
Laws, Policies, & Programs
| No Child Left Behind Act 2001 | 4 |
| Elementary and Secondary… | 3 |
| Individuals with Disabilities… | 3 |
Assessments and Surveys
What Works Clearinghouse Rating
Peer reviewedBriere, John; And Others – Journal of Interpersonal Violence, 1995
Examines psychometric characteristics of the 100-item Trauma Symptom Inventory (TSI) in a sample of 370 psychiatric inpatients and psychotherapy outpatients. Post hoc multiple regression analyses indicated that client age, sex, inpatient versus outpatient status, childhood sexual and physical abuse, and adult sexual assault were unique predictors…
Descriptors: Cognitive Measurement, Cognitive Psychology, Counseling, Examiners
Peer reviewedCooil, Bruce; Rust, Roland T. – Psychometrika, 1994
It is proposed that proportional reduction in loss (PRL) be used as a theoretical basis to derive, justify, and interpret reliability measures to gauge reliability on a zero-to-one scale. This PRL approach simplifies the interpretation of existing measures (e.g., generalizability-theory measures). (SLD)
Descriptors: Data Analysis, Equations (Mathematics), Estimation (Mathematics), Generalizability Theory
Peer reviewedMillsap, Roger E.; Everson, Howard – Multivariate Behavioral Research, 1991
Use of confirmatory factor analysis (CFA) with nonzero latent means in testing six different measurement models from classical test theory is discussed. Implications of the six models for observed mean and covariance structures are described, and three examples of the use of CFA in testing the models are presented. (SLD)
Descriptors: Comparative Analysis, Equations (Mathematics), Goodness of Fit, Mathematical Models
Peer reviewedArmstrong, Ronald D.; And Others – Journal of Educational Statistics, 1994
A network-flow model is formulated for constructing parallel tests based on classical test theory while using test reliability as the criterion. Practitioners can specify a test-difficulty distribution for values of item difficulties as well as test-composition requirements. An empirical study illustrates the reliability of generated tests. (SLD)
Descriptors: Algorithms, Computer Assisted Testing, Difficulty Level, Item Banks
Peer reviewedGlutting, Joseph J.; McDermott, Paul A.; Watkins, Marley M.; Kush, Joseph C.; Konold, Timothy R. – School Psychology Review, 1997
Compares various base-rate procedures with statistical-significance testing approach used by psychologists. Nonlinear multivariate method is used to determine whether children with a learning disability or emotional disturbance are more likely to show unusual subtest patterns than are children from the normative sample of Wechsler Intelligence…
Descriptors: Emotional Problems, Exceptional Child Research, Learning Disabilities, Multivariate Analysis
Rorvig, Mark – Proceedings of the ASIS Annual Meeting, 2000
Proposes a new technique for the evaluation of question difficulty. Suggests that question dispersion by multidimensional scaling models the question-response pattern required by test theory, but without the population density requirements of the traditional methods. Considers the effect on knowledge management functions, including library…
Descriptors: Difficulty Level, Evaluation Methods, Library Services, Multidimensional Scaling
Peer reviewedClapham, Caroline – Annual Review of Applied Linguistics, 2000
Explores the term "applied linguistics" and discusses the role of language testing within this discipline, the relationship between testing and teaching, and the relationship between testing and assessment (Author/VWL)
Descriptors: Applied Linguistics, Evaluation Methods, Intellectual Disciplines, Language Tests
White, Edward M. – College Composition and Communication, 2005
Although most portfolio evaluation currently uses some adaptation of holistic scoring, the problems with scoring portfolios holistically are many, much more than for essays, and the problems are not readily resolvable. Indeed, many aspects of holistic scoring work against the principles behind portfolio assessment. We have from the start needed a…
Descriptors: Portfolios (Background Materials), Scoring, Holistic Evaluation, Portfolio Assessment
Gump, Steven E. – Educational Research Quarterly, 2007
This review presents an overview of selected articles on the leniency hypothesis: the idea that students give higher evaluations to instructors who grade more leniently. Such articles comprise a small subset of the voluminous research on student evaluations of teaching (SETs). In this diverse literature, research methods and aims have frequently…
Descriptors: Student Evaluation of Teacher Performance, Research Methodology, Meta Analysis, Research Problems
Mislevy, Robert J. – 1994
Test theory encompasses models and methods for drawing inferences about what students know and can do, cast in a framework of ideas from measurement, education, and psychology. The emerging paradigm of cognitive psychology prompts new considerations about collecting and interpreting evidence, suggesting alternative models for the nature,…
Descriptors: Alternative Assessment, Cognitive Psychology, Educational Assessment, Inferences
Thompson, Bruce; Dennings, Bruce – 1993
Q-technique factor analysis identifies clusters or factors of people, rather than of variables, and has proven very popular, especially with regard to testing typology theories. The present study investigated the utility of three different protocols for obtaining data for Q-technique studies. These three protocols were: (1) a conventional ipsative…
Descriptors: Classification, Comparative Analysis, Data Collection, Factor Analysis
Livingston, Samuel A. – 1983
Discussed are nine questions regarding standard setting issues in educational testing: (1) Should normative or content-referenced standards be used? (2) Different standard setting methods yield different results. Does this finding present a problem? (3) Assess the adequacy of the grounding of various methods of standard setting in psychological…
Descriptors: Educational Testing, Evaluation, Evaluation Methods, Measurement Objectives
Zhang, Jinming – ETS Research Report Series, 2004
This paper extends the theory of conditional covariances to polytomous items. It has been mathematically proven that under some mild conditions, commonly assumed in the analysis of response data, the conditional covariance of two items, dichotomously or polytomously scored, is positive if the two items are dimensionally homogeneous and negative…
Descriptors: Test Items, Test Theory, Correlation, National Competency Tests
Wilhite, Stephen C. – 1986
A study examined the effect of headings and adjunct questions embedded in an expository test on the delayed multiple-choice test performance of 88 undergraduate students enrolled in psychology courses. The subject of the passage read by the students was the settling of Anglo America; the subheadings in the passage listed names of major subtopics…
Descriptors: Higher Education, Motivation, Multiple Choice Tests, Questioning Techniques
Harker, Jill K.; Cope, Ronald T. – 1988
Cut scores obtained for licensure tests using different judgmental methods of standard setting (holistic, test blueprint, Angoff, and modified Angoff) were compared. Nineteen educators and practitioners participated in this study as judges. Pre- and post-test feedback (feedback of total- and low-group item p-value) ratings were obtained under the…
Descriptors: Cutting Scores, Feedback, Holistic Evaluation, Interrater Reliability

Direct link
