NotesFAQContact Us
Collection
Advanced
Search Tips
Showing all 10 results Save | Export
Peer reviewed Peer reviewed
Direct linkDirect link
Phillips, Gary W. – Applied Measurement in Education, 2015
This article proposes that sampling design effects have potentially huge unrecognized impacts on the results reported by large-scale district and state assessments in the United States. When design effects are unrecognized and unaccounted for they lead to underestimating the sampling error in item and test statistics. Underestimating the sampling…
Descriptors: State Programs, Sampling, Research Design, Error of Measurement
Peer reviewed Peer reviewed
Whitely, Susan E.; Dawis, Rene V. – Journal of Educational Measurement, 1974
Descriptors: Error of Measurement, Item Analysis, Matrices, Measurement Techniques
Munoz-Colberg, Magda – 1977
The logical foundations of deduction and induction are outlined to form the rules for the construction of a set of tests of reasoning ability. Both deduction and induction involve the derivation of a conclusion from a set of premises. Deductive logic uses syllogisms and is abstract. Inductive logic is both empirical and abstract. Although…
Descriptors: Abstract Reasoning, Cognitive Tests, Deduction, Induction
Peer reviewed Peer reviewed
Weber, Margaret B. – Educational and Psychological Measurement, 1977
Bilevel dimensionality of probability was examined via factor analysis, Rasch latent trait analysis, and classical item analysis. Results suggest that when nonstandardized measures are the criteria for achievement, relying solely on estimates of content validity may lead to erroneous interpretation of test score data. (JKS)
Descriptors: Achievement, Achievement Tests, Factor Analysis, Item Analysis
Peer reviewed Peer reviewed
Aiken, Lewis R. – Educational and Psychological Measurement, 1985
Three numerical coefficients for analyzing the validity and reliability of ratings are described. Each coefficient is computed as the ratio of an obtained to a maximum sum of differences in ratings. The coefficients are also applicable to the item analysis, agreement analysis, and cluster or factor analysis of rating-scale data. (Author/BW)
Descriptors: Computer Software, Data Analysis, Factor Analysis, Item Analysis
PDF pending restoration PDF pending restoration
Kane, Michael T.; Moloney, James M. – 1976
The Answer-Until-Correct (AUC) procedure has been proposed in order to increase the reliability of multiple-choice items. A model for examinees' behavior when they must respond to each item until they answer it correctly is presented. An expression for the reliability of AUC items, as a function of the characteristics of the item and the scoring…
Descriptors: Guessing (Tests), Item Analysis, Mathematical Models, Multiple Choice Tests
Brennan, Robert L,; Lockwood, Robert E. – 1979
Procedures for determining cutting scores have been proposed by Angoff and by Nedelsky. Nedelsky's approach requires that a rater examine each distractor within a test item to determine the probability of a minimally competent examinee answering correctly; whereas Angoff uses a judgment based on the whole item, rather than each of its components.…
Descriptors: Achievement Tests, Comparative Analysis, Cutting Scores, Guessing (Tests)
Rentz, R. Robert; Bashaw, W. L. – 1975
In order to determine if Rasch Model procedures have any utility for equating pre-existing tests, this study reanalyzed the data from the equating phase of the Anchor Test Study which used a variety of equipercentile and linear model methods. The tests involved included seven reading test batteries, each having from one to three levels and two…
Descriptors: Comparative Analysis, Elementary Education, Equated Scores, Error of Measurement
Rentz, R. Robert; Bashaw, W. L. – 1975
This volume contains tables of item analysis results obtained by following procedures associated with the Rasch Model for those reading tests used in the Anchor Test Study. Appendix I gives the test names and their corresponding analysis code numbers. Section I (Basic Item Analyses) presents data for the item analysis of each test in a two part…
Descriptors: Comparative Analysis, Elementary Education, Equated Scores, Error of Measurement
Levine, Michael V.; Rubin, Donald B. – 1976
Appropriateness indexes (statistical formulas) for detecting suspiciously high or low scores on aptitude tests were presented, based on a simulation of the Scholastic Aptitude Test (SAT) with 3,000 simulated scores--2,800 normal and 200 suspicious. The traditional index--marginal probability--uses a model for the normal examinee's test-taking…
Descriptors: Academic Ability, Aptitude Tests, College Entrance Examinations, High Schools