NotesFAQContact Us
Collection
Advanced
Search Tips
What Works Clearinghouse Rating
Showing 316 to 330 of 582 results Save | Export
Peer reviewed Peer reviewed
Collet, Leverne S. – Journal of Educational Measurement, 1971
The purpose of this paper was to provide an empirical test of the hypothesis that elimination scores are more reliable and valid than classical corrected-for-guessing scores or weighted-choice scores. The evidence presented supports the hypothesized superiority of elimination scoring. (Author)
Descriptors: Evaluation, Guessing (Tests), Multiple Choice Tests, Scoring Formulas
van den Brink, Wulfert – Evaluation in Education: International Progress, 1982
Binomial models for domain-referenced testing are compared, emphasizing the assumptions underlying the beta-binomial model. Advantages and disadvantages are discussed. A proposed item sampling model is presented which takes the effect of guessing into account. (Author/CM)
Descriptors: Comparative Analysis, Criterion Referenced Tests, Item Sampling, Measurement Techniques
Peer reviewed Peer reviewed
Spencer, Ernest – Scottish Educational Review, 1981
Using data from the SCRE Criterion Test composition papers, the author tests the hypothesis that the bulk of inter-marker unreliability is caused by inter-marker inconsistency--which is not correctable statistically. He suggests that a shift to "consensus" standards will realize greater improvements than statistical standardizing alone.…
Descriptors: Achievement Tests, English Instruction, Essay Tests, Reliability
Atkinson, George F.; Doadt, Edward – Assessment in Higher Education, 1980
Some perceived difficulties with conventional multiple choice tests are mentioned, and a modified form of examination is proposed. It uses a computer program to award partial marks for partially correct answers, full marks for correct answers, and check for widespread misunderstanding of an item or subject. (MSE)
Descriptors: Achievement Tests, Computer Assisted Testing, Higher Education, Multiple Choice Tests
Peer reviewed Peer reviewed
McGaw, Barry; Glass, Gene V. – American Educational Research Journal, 1980
There are difficulties in expressing effect sizes on a common metric when some studies use transformed scales to express group differences, or use factorial designs or covariance adjustments to obtain a reduced error term. A common metric on which effect sizes may be standardized is described. (Author/RL)
Descriptors: Control Groups, Error of Measurement, Mathematical Models, Research Problems
Peer reviewed Peer reviewed
Kleven, Thor Arnfinn – Scandinavian Journal of Educational Research, 1979
Supposing different values of the standard measurement error, the relation of scale coarseness to the total amount of error is studied on the basis of probability distribution of error. The analyses are performed within two models of error and with two criteria of amount of error. (Editor/SJL)
Descriptors: Cutting Scores, Error of Measurement, Goodness of Fit, Grading
Berger, Peter N. – Teaching and Learning Literature with Children and Young Adults, 1997
Discusses problems with scoring reliability of the Vermont Education Department's writing portfolio test, particularly the difficulties teachers face in agreeing upon scoring criteria. (PA)
Descriptors: Elementary Secondary Education, Interrater Reliability, Portfolio Assessment, Portfolios (Background Materials)
Frary, Robert B.; And Others – 1985
Students in an introductory college course (n=275) responded to equivalent 20-item halves of a test under number-right and formula-scoring instructions. Formula scores of those who omitted items overaged about one point lower than their comparable (formula adjusted) scores on the test half administered under number-right instructions. In contrast,…
Descriptors: Guessing (Tests), Higher Education, Multiple Choice Tests, Questionnaires
Kazelskis, Richard; And Others – 1987
Numerous techniques are available for determining cutoff scores for distinguishing between proficient and non-proficient examinees. One of the more commonly cited techniques for standard setting is the Nedelsky Method. In response to criticism of this method, Gross (1985) presented a revised Nedelsky technique. However, no research beyond that…
Descriptors: Competence, Cutting Scores, Measurement Techniques, Scoring Formulas
Peer reviewed Peer reviewed
Brannigan, Gary G. – Psychology in the Schools, 1975
Several studies concerning scoring difficulties on the Wechsler intelligence scales were reviewed. Since scoring of responses on the comprehension, similarities and vocabulary subtests of the Wechsler scales demands judgements by the examiner, the possibility of poor interscorer reliability increases. More thorough scoring standards and revision…
Descriptors: Intelligence Differences, Intelligence Tests, Measurement Techniques, Psychological Testing
Peer reviewed Peer reviewed
Baskin, David – Journal of Educational Measurement, 1975
Traditional test scoring does not allow the examination of differences among subjects obtaining identical raw scores on the same test. A configuration scoring paradigm for identical raw scores, which provides for such comparisons, is developed and illustrated. (Author)
Descriptors: Elementary Secondary Education, Individual Differences, Mathematical Models, Multiple Choice Tests
Moy, Raymond H. – 1981
The problem of standard setting on language proficiency tests is often approached by the use of norms derived from the group being tested, a process commonly known as "grading on the curve." One particular problem with this ad hoc method of standard setting is that it will usually result in a fluctuating standard dependent on the particular group…
Descriptors: Cutting Scores, Higher Education, Language Proficiency, Norm Referenced Tests
Yen, Wendy M. – 1982
Test scores that are not perfectly reliable cannot be strictly equated unless they are strictly parallel. This fact implies that tau equivalence can be lost if an equipercentile equating is applied to observed scores that are not strictly parallel. Thirty-six simulated data sets are produced to simulate equating tests with different difficulties…
Descriptors: Difficulty Level, Equated Scores, Latent Trait Theory, Methods
Modu, Christopher C. – 1981
The effects of applying different methods of determining different sets of subscore weights on the composite score ranking of examinees were investigated. Four sets of subscore weights were applied to each of three examination results. The scores were from Advanced Placement (AP) Examinations in History of Art, Spanish Language, and Chemistry. One…
Descriptors: Advanced Placement Programs, Correlation, Equated Scores, Higher Education
Boldt, Robert F. – 1974
One formulation of confidence scoring requires the examinee to indicate as a number his personal probability of the correctness of each alternative in a multiple-choice test. For this formulation a linear transformation of the logarithm of the correct response is maximized if the examinee accurately reports his personal probability. To equate…
Descriptors: Confidence Testing, Guessing (Tests), Multiple Choice Tests, Probability
Pages: 1  |  ...  |  18  |  19  |  20  |  21  |  22  |  23  |  24  |  25  |  26  |  ...  |  39