Publication Date
In 2025 | 0 |
Since 2024 | 0 |
Since 2021 (last 5 years) | 0 |
Since 2016 (last 10 years) | 1 |
Since 2006 (last 20 years) | 1 |
Descriptor
Test Reliability | 12 |
Sampling | 8 |
Item Sampling | 4 |
Test Construction | 4 |
Test Items | 4 |
Test Validity | 4 |
Testing Problems | 4 |
Estimation (Mathematics) | 3 |
Academic Achievement | 2 |
Computer Simulation | 2 |
Equations (Mathematics) | 2 |
More ▼ |
Source
Applied Psychological… | 1 |
Brookings Papers on Education… | 1 |
College Student Journal | 1 |
Ethics and Education | 1 |
Evaluation and Research in… | 1 |
Psychometrika | 1 |
Author
Albanese, Mark A. | 1 |
Bourque, Mary Lyn | 1 |
Eiting, Mindert H. | 1 |
Fendler, Lynn | 1 |
Hsiung, Chao A. | 1 |
Kane, Thomas J. | 1 |
Liang, Xin | 1 |
Lin, Miao-Hsiang | 1 |
Linn, Robert | 1 |
Meijer, Rob R. | 1 |
Shavelson, Richard J. | 1 |
More ▼ |
Publication Type
Reports - Evaluative | 12 |
Journal Articles | 6 |
Speeches/Meeting Papers | 3 |
Education Level
Grade 4 | 1 |
Grade 5 | 1 |
Higher Education | 1 |
Secondary Education | 1 |
Audience
Location
California | 1 |
Netherlands | 1 |
North Carolina | 1 |
Laws, Policies, & Programs
Assessments and Surveys
Graduate Management Admission… | 1 |
National Assessment of… | 1 |
SAT (College Admission Test) | 1 |
What Works Clearinghouse Rating
Fendler, Lynn – Ethics and Education, 2016
In educational research that calls itself empirical, the relationship between validity and reliability is that of trade-off: the stronger the bases for validity, the weaker the bases for reliability (and vice versa). Validity and reliability are widely regarded as basic criteria for evaluating research; however, there are ethical implications of…
Descriptors: Educational Research, Ethics, Test Validity, Test Reliability

Taylor, Annette Kujawski – College Student Journal, 2005
This research examined 2 elements of multiple-choice test construction, balancing the key and optimal number of options. In Experiment 1 the 3 conditions included a balanced key, overrepresentation of a and b responses, and overrepresentation of c and d responses. The results showed that error-patterns were independent of the key, reflecting…
Descriptors: Comparative Analysis, Test Items, Multiple Choice Tests, Test Construction
Linn, Robert – 1978
A series of studies on conceptual and design problems in competency-based measurements are explained. The concept of validity within the context of criterion-referenced measurement is reviewed. The authors believe validation should be viewed as a process rather than an end product. It is the process of marshalling evidence to support…
Descriptors: Criterion Referenced Tests, Item Analysis, Item Sampling, Test Bias

Lin, Miao-Hsiang; Hsiung, Chao A. – Psychometrika, 1992
Four bootstrap methods are identified for constructing confidence intervals for the binomial-error model. The extent to which similar results are obtained and the theoretical foundation of each method and its relevance and ranges of modeling the true score uncertainty are discussed. (SLD)
Descriptors: Bayesian Statistics, Computer Simulation, Equations (Mathematics), Estimation (Mathematics)

Eiting, Mindert H. – Applied Psychological Measurement, 1991
A method is proposed for sequential evaluation of reliability of psychometric instruments. Sample size is unfixed; a test statistic is computed after each person is sampled and a decision is made in each stage of the sampling process. Results from a series of Monte-Carlo experiments establish the method's efficiency. (SLD)
Descriptors: Computer Simulation, Equations (Mathematics), Estimation (Mathematics), Mathematical Models
Liang, Xin – Evaluation and Research in Education, 2003
Multiple matrix sampling is a data collection technique that ensures accuracy and efficiency in group performance. It has been widely used in large-scale curriculum evaluation since the 1980s. However, the design does not always fully embrace the dynamics of local evaluation demands. The purpose of this study is to introduce a modified matrix…
Descriptors: Curriculum Evaluation, Item Sampling, Matrices, Statistical Studies
Meijer, Rob R.; And Others – 1994
Three methods for the estimation of the reliability of single dichotomous items are discussed. All methods are based on the assumptions of nondecreasing and nonintersecting item response functions and the Mokken model of double monotonicity. Based on analytical and Monte Carlo studies, it is concluded that one method is superior to the other two…
Descriptors: Estimation (Mathematics), Foreign Countries, Item Response Theory, Monte Carlo Methods
Skaggs, Gary; Bourque, Mary Lyn – 1998
Political and legislative pressures have posed a number of measurement issues and challenges to the development of sound, valid voluntary national tests (VNTs). This paper focuses on what appear to be the most difficult technical issues related to the VNT proposed by President Clinton in 1997. Technical issues refer to psychometric issues, as…
Descriptors: Academic Achievement, Achievement Tests, Classification, Difficulty Level
Kane, Thomas J.; Staiger, Douglas O. – Brookings Papers on Education Policy, 2002
By the spring of 2000, forty states had begun using student test scores to rate school performance. Twenty states have gone a step further and are attaching explicit monetary rewards or sanctions to a school's test performance. In this paper, the authors focus on accountability programs in which states measure the effectiveness of individual…
Descriptors: Elementary Schools, Accountability, Scores, Risk
Albanese, Mark A. – 1985
This study reexamines results reported by Angoff and Schrader regarding formula directions and rights directions for standardized tests. In that study, it was concluded that the two scoring directions were essentially equivalent. In this study, methodological concerns are discussed and additional data analyses undertaken. Among various…
Descriptors: College Entrance Examinations, Data Interpretation, Fatigue (Biology), Guessing (Tests)
Shavelson, Richard J.; And Others – 1993
In this paper, performance assessments are cast within a sampling framework. A performance assessment score is viewed as a sample of student performance drawn from a complex universe defined by a combination of all possible tasks, occasions, raters, and measurement methods. Using generalizability theory, the authors present evidence bearing on the…
Descriptors: Academic Achievement, Educational Assessment, Error of Measurement, Evaluators
de Jong, John H. A. L. – 1982
The development and validation of a test of listening comprehension for English as a second language at the Dutch National Institute for Educational Measurement (Cito) is described. The test uses two distinct item formats: true-false items and modified cloze items with two options. Both item formats were found to measure foreign language listening…
Descriptors: Cloze Procedure, English (Second Language), Evaluation Criteria, Foreign Countries