NotesFAQContact Us
Collection
Advanced
Search Tips
What Works Clearinghouse Rating
Showing 316 to 330 of 582 results Save | Export
Peer reviewed Peer reviewed
McGaw, Barry; Glass, Gene V. – American Educational Research Journal, 1980
There are difficulties in expressing effect sizes on a common metric when some studies use transformed scales to express group differences, or use factorial designs or covariance adjustments to obtain a reduced error term. A common metric on which effect sizes may be standardized is described. (Author/RL)
Descriptors: Control Groups, Error of Measurement, Mathematical Models, Research Problems
Peer reviewed Peer reviewed
Kleven, Thor Arnfinn – Scandinavian Journal of Educational Research, 1979
Supposing different values of the standard measurement error, the relation of scale coarseness to the total amount of error is studied on the basis of probability distribution of error. The analyses are performed within two models of error and with two criteria of amount of error. (Editor/SJL)
Descriptors: Cutting Scores, Error of Measurement, Goodness of Fit, Grading
Berger, Peter N. – Teaching and Learning Literature with Children and Young Adults, 1997
Discusses problems with scoring reliability of the Vermont Education Department's writing portfolio test, particularly the difficulties teachers face in agreeing upon scoring criteria. (PA)
Descriptors: Elementary Secondary Education, Interrater Reliability, Portfolio Assessment, Portfolios (Background Materials)
Peer reviewed Peer reviewed
PDF on ERIC Download full text
Attali, Yigal – ETS Research Report Series, 2007
Because there is no commonly accepted view of what makes for good writing, automated essay scoring (AES) ideally should be able to accommodate different theoretical positions, certainly at the level of state standards but also perhaps among teachers at the classroom level. This paper presents a practical approach and an interactive computer…
Descriptors: Computer Assisted Testing, Automation, Essay Tests, Scoring
Frary, Robert B.; And Others – 1985
Students in an introductory college course (n=275) responded to equivalent 20-item halves of a test under number-right and formula-scoring instructions. Formula scores of those who omitted items overaged about one point lower than their comparable (formula adjusted) scores on the test half administered under number-right instructions. In contrast,…
Descriptors: Guessing (Tests), Higher Education, Multiple Choice Tests, Questionnaires
Kazelskis, Richard; And Others – 1987
Numerous techniques are available for determining cutoff scores for distinguishing between proficient and non-proficient examinees. One of the more commonly cited techniques for standard setting is the Nedelsky Method. In response to criticism of this method, Gross (1985) presented a revised Nedelsky technique. However, no research beyond that…
Descriptors: Competence, Cutting Scores, Measurement Techniques, Scoring Formulas
Peer reviewed Peer reviewed
Brannigan, Gary G. – Psychology in the Schools, 1975
Several studies concerning scoring difficulties on the Wechsler intelligence scales were reviewed. Since scoring of responses on the comprehension, similarities and vocabulary subtests of the Wechsler scales demands judgements by the examiner, the possibility of poor interscorer reliability increases. More thorough scoring standards and revision…
Descriptors: Intelligence Differences, Intelligence Tests, Measurement Techniques, Psychological Testing
Peer reviewed Peer reviewed
Baskin, David – Journal of Educational Measurement, 1975
Traditional test scoring does not allow the examination of differences among subjects obtaining identical raw scores on the same test. A configuration scoring paradigm for identical raw scores, which provides for such comparisons, is developed and illustrated. (Author)
Descriptors: Elementary Secondary Education, Individual Differences, Mathematical Models, Multiple Choice Tests
Moy, Raymond H. – 1981
The problem of standard setting on language proficiency tests is often approached by the use of norms derived from the group being tested, a process commonly known as "grading on the curve." One particular problem with this ad hoc method of standard setting is that it will usually result in a fluctuating standard dependent on the particular group…
Descriptors: Cutting Scores, Higher Education, Language Proficiency, Norm Referenced Tests
Yen, Wendy M. – 1982
Test scores that are not perfectly reliable cannot be strictly equated unless they are strictly parallel. This fact implies that tau equivalence can be lost if an equipercentile equating is applied to observed scores that are not strictly parallel. Thirty-six simulated data sets are produced to simulate equating tests with different difficulties…
Descriptors: Difficulty Level, Equated Scores, Latent Trait Theory, Methods
Modu, Christopher C. – 1981
The effects of applying different methods of determining different sets of subscore weights on the composite score ranking of examinees were investigated. Four sets of subscore weights were applied to each of three examination results. The scores were from Advanced Placement (AP) Examinations in History of Art, Spanish Language, and Chemistry. One…
Descriptors: Advanced Placement Programs, Correlation, Equated Scores, Higher Education
Boldt, Robert F. – 1974
One formulation of confidence scoring requires the examinee to indicate as a number his personal probability of the correctness of each alternative in a multiple-choice test. For this formulation a linear transformation of the logarithm of the correct response is maximized if the examinee accurately reports his personal probability. To equate…
Descriptors: Confidence Testing, Guessing (Tests), Multiple Choice Tests, Probability
Jacobs, Stanley S. – 1974
Investigated were the effects of two levels of penalty for incorrect responses on two dependent variables (a measure of risk-taking or confidence, based on nonsense items, and the number of response-attempts to legitimate items) for three treatment groups in a 2x3, multi-response repeated measures, multivariate ANOVA (Analysis of Variance) design.…
Descriptors: Confidence Testing, Criterion Referenced Tests, Guessing (Tests), Multiple Choice Tests
Felsenthal, Norman A.; Felsenthal, Helen – 1972
A computer program called TEXAN (Textual Analysis of Language Samples) was developed for use in calculating frequency of characters, words, punctuation units, and stylistic variables. Its usefulness in determining readability levels was examined in an analysis of language samples from 20 elementary tradebooks used as supplementary reading…
Descriptors: Automatic Indexing, Comparative Analysis, Computational Linguistics, Information Processing
Peer reviewed Peer reviewed
Kane, Michael; Moloney, James – Applied Psychological Measurement, 1978
The answer-until-correct (AUC) procedure requires that examinees respond to a multi-choice item until they answer it correctly. Using a modified version of Horst's model for examinee behavior, this paper compares the effect of guessing on item reliability for the AUC procedure and the zero-one scoring procedure. (Author/CTM)
Descriptors: Guessing (Tests), Item Analysis, Mathematical Models, Multiple Choice Tests
Pages: 1  |  ...  |  18  |  19  |  20  |  21  |  22  |  23  |  24  |  25  |  26  |  ...  |  39