Publication Date
| In 2026 | 0 |
| Since 2025 | 8 |
| Since 2022 (last 5 years) | 36 |
| Since 2017 (last 10 years) | 115 |
| Since 2007 (last 20 years) | 378 |
Descriptor
| Test Theory | 1166 |
| Test Items | 262 |
| Test Reliability | 252 |
| Test Construction | 246 |
| Test Validity | 245 |
| Psychometrics | 183 |
| Scores | 176 |
| Item Response Theory | 168 |
| Foreign Countries | 160 |
| Item Analysis | 141 |
| Statistical Analysis | 134 |
| More ▼ | |
Source
Author
Publication Type
Education Level
Location
| United States | 17 |
| United Kingdom (England) | 15 |
| Canada | 14 |
| Australia | 13 |
| Turkey | 12 |
| Sweden | 8 |
| United Kingdom | 8 |
| Netherlands | 7 |
| Texas | 7 |
| New York | 6 |
| Taiwan | 6 |
| More ▼ | |
Laws, Policies, & Programs
| No Child Left Behind Act 2001 | 4 |
| Elementary and Secondary… | 3 |
| Individuals with Disabilities… | 3 |
Assessments and Surveys
What Works Clearinghouse Rating
Sandbank, Micheal; Yoder, Paul – Topics in Early Childhood Special Education, 2014
Generalizability and decision studies provide a mathematical framework for quantifying the stability of a given number of measurements. This approach is especially relevant to the task of obtaining a representative measure of communicative behavior in young children and supports an alternative to the debate regarding which type of assessment…
Descriptors: Developmental Delays, Toddlers, Intervention, Vocabulary Development
Badjadi, Nour El Imane – Online Submission, 2013
The current paper on writing assessment surveys the literature on the reliability and validity of essay tests. The paper aims to examine the two concepts in relationship with essay testing as well as to provide a snapshot of the current understandings of the reliability and validity of essay tests as drawn in recent research studies. Bearing in…
Descriptors: Essay Tests, Writing Evaluation, Test Validity, Test Reliability
Zimmerman, Donald W. – Journal of Educational and Behavioral Statistics, 2011
Many well-known equations in classical test theory are mathematical identities in populations of individuals but not in random samples from those populations. First, test scores are subject to the same sampling error that is familiar in statistical estimation and hypothesis testing. Second, the assumptions made in derivation of formulas in test…
Descriptors: Test Theory, Equations (Mathematics), Scores, Sampling
Brousselle, Astrid; Champagne, Francois – Evaluation and Program Planning, 2011
Program theory evaluation, which has grown in use over the past 10 years, assesses whether a program is designed in such a way that it can achieve its intended outcomes. This article describes a particular type of program theory evaluation--logic analysis--that allows us to test the plausibility of a program's theory using scientific knowledge.…
Descriptors: Evaluators, Program Evaluation, Logical Thinking, Validity
Lai, Kevin; Cabrera, Julio; Vitale, Jonathan M.; Madhok, Jacquie; Tinker, Robert; Linn, Marcia C. – Journal of Science Education and Technology, 2016
Interpreting and creating graphs plays a critical role in scientific practice. The K-12 Next Generation Science Standards call for students to use graphs for scientific modeling, reasoning, and communication. To measure progress on this dimension, we need valid and reliable measures of graph understanding in science. In this research, we designed…
Descriptors: Middle School Students, Secondary School Science, Science Instruction, Graphs
van Ravenzwaaij, Don; van der Maas, Han L. J.; Wagenmakers, Eric-Jan – Psychological Review, 2012
In their influential "Psychological Review" article, Bogacz, Brown, Moehlis, Holmes, and Cohen (2006) discussed optimal decision making as accomplished by the drift diffusion model (DDM). The authors showed that neural inhibition models, such as the leaky competing accumulator model (LCA) and the feedforward inhibition model (FFI), can mimic the…
Descriptors: Intelligent Tutoring Systems, Inhibition, Bayesian Statistics, Decision Making
Maydeu-Olivares, Alberto – Measurement: Interdisciplinary Research and Perspectives, 2013
In this rejoinder, Maydeu-Olivares states that, in item response theory (IRT) measurement applications, the application of goodness-of-fit (GOF) methods informs researchers of the discrepancy between the model and the data being fitted (the room for improvement). By routinely reporting the GOF of IRT models, together with the substantive results…
Descriptors: Goodness of Fit, Models, Evaluation Methods, Item Response Theory
Lambert, Matthew C.; Hurley, Kristin Duppong; Tomlinson, M. Michele Athay; Stevens, Amy L. – Child & Youth Care Forum, 2013
Background: A client's motivation to receive services is significantly related to seeking services, remaining in services, and improved outcomes. The Motivation for Youth Treatment Scale (MYTS) is one of the few brief measures used to assess motivation for mental health treatment. Objective: To investigate if the psychometric properties of the…
Descriptors: Motivation, Mental Health, Health Services, Access to Health Care
Bramley, Tom; Dhawan, Vikas – Research Papers in Education, 2013
This paper discusses the issues involved in calculating indices of composite reliability for "modular" or "unitised" assessments of the kind used in GCSEs, AS and A level examinations in England. The increasingly widespread use of on-screen marking has meant that the item-level data required for calculating indices of…
Descriptors: Foreign Countries, Exit Examinations, Secondary Education, Test Reliability
Grigg, Kaine; Manderson, Lenore – Australian Educational and Developmental Psychologist, 2015
Existing Australian measures of racist attitudes focus on single groups or have not been validated across the lifespan. To redress this, the present research aimed to develop and validate a measure of racial, ethnic, cultural and religious acceptance--the Australian Racism, Acceptance, and Cultural-Ethnocentrism Scale (RACES)--for use with…
Descriptors: Racial Bias, Racial Attitudes, Foreign Countries, Ethnocentrism
Rice, Stephen; Geels, Kasha; Trafimow, David; Hackett, Holly – Online Submission, 2011
Test scores are used to assess one's general knowledge of a specific area. Although strategies to improve test performance have been previously identified, the consistency with which one uses these strategies has not been analyzed in such a way that allows assessment of how much consistency affects overall performance. Participants completed one…
Descriptors: Performance, Test Theory, Reliability, Knowledge Level
Abedalaziz, Nabeel; Leng, Chin Hai – Malaysian Online Journal of Educational Sciences, 2013
Most of the tests and inventories used by counseling psychologists have been developed using CTT; IRT derives from what is called latent trait theory. A number of important differences exist between CTT- versus IRT-based approaches to both test development and evaluation, as well as the process of scoring the response profiles of individual…
Descriptors: Test Theory, Item Response Theory, Difficulty Level, Models
Williamson, Kathryn E.; Willoughby, Shannon; Prather, Edward E. – Astronomy Education Review, 2013
We introduce the Newtonian Gravity Concept Inventory (NGCI), a 26-item multiple-choice instrument to assess introductory general education college astronomy ("Astro 101") student understanding of Newtonian gravity. This paper describes the development of the NGCI through four phases: Planning, Construction, Quantitative Analysis, and…
Descriptors: Science Instruction, Scientific Concepts, Astronomy, College Science
Beauducel, Andre – Applied Psychological Measurement, 2013
The problem of factor score indeterminacy implies that the factor and the error scores cannot be completely disentangled in the factor model. It is therefore proposed to compute Harman's factor score predictor that contains an additive combination of factor and error variance. This additive combination is discussed in the framework of classical…
Descriptors: Factor Analysis, Predictor Variables, Reliability, Error of Measurement
Engelhard, George, Jr.; Wind, Stefanie A. – College Board, 2013
The major purpose of this study is to examine the quality of ratings assigned to CR (constructed-response) questions in large-scale assessments from the perspective of Rasch Measurement Theory. Rasch Measurement Theory provides a framework for the examination of rating scale category structure that can yield useful information for interpreting the…
Descriptors: Measurement Techniques, Rating Scales, Test Theory, Scores

Peer reviewed
Direct link
