Publication Date
In 2025 | 0 |
Since 2024 | 0 |
Since 2021 (last 5 years) | 0 |
Since 2016 (last 10 years) | 0 |
Since 2006 (last 20 years) | 3 |
Descriptor
Scoring Formulas | 16 |
Statistical Analysis | 16 |
Test Reliability | 16 |
Guessing (Tests) | 6 |
Multiple Choice Tests | 6 |
Response Style (Tests) | 5 |
Predictive Validity | 4 |
Test Construction | 4 |
Test Interpretation | 4 |
Test Validity | 4 |
Testing | 4 |
More ▼ |
Source
American Educational Research… | 1 |
Applied Psychological… | 1 |
College Board | 1 |
Educational and Psychological… | 1 |
Journal of Computer-Based… | 1 |
Peabody Journal of Education | 1 |
TESOL Quarterly: A Journal… | 1 |
Author
Bayuk, Robert J. | 1 |
Berk, Ronald A. | 1 |
Bormuth, John R. | 1 |
Brown, Thomas A. | 1 |
Bruno, James E. | 1 |
Cross, Lawrence H. | 1 |
Donlon, Thomas F. | 1 |
Engell, Sebastian | 1 |
Frederiksen, Norman | 1 |
Frey, Andreas | 1 |
Gräfe, Linda | 1 |
More ▼ |
Publication Type
Reports - Research | 9 |
Journal Articles | 3 |
Speeches/Meeting Papers | 3 |
Information Analyses | 1 |
Education Level
Higher Education | 3 |
Postsecondary Education | 2 |
High Schools | 1 |
Audience
Researchers | 1 |
Location
Germany | 1 |
Laws, Policies, & Programs
Assessments and Surveys
California Achievement Tests | 1 |
Graduate Record Examinations | 1 |
SAT (College Admission Test) | 1 |
What Works Clearinghouse Rating
Taskinen, Päivi H.; Steimel, Jochen; Gräfe, Linda; Engell, Sebastian; Frey, Andreas – Peabody Journal of Education, 2015
This study examined students' competencies in engineering education at the university level. First, we developed a competency model in one specific field of engineering: process dynamics and control. Then, the theoretical model was used as a frame to construct test items to measure students' competencies comprehensively. In the empirical…
Descriptors: Models, Engineering Education, Test Items, Outcome Measures
Stewart, Jeffrey; White, David A. – TESOL Quarterly: A Journal for Teachers of English to Speakers of Other Languages and of Standard English as a Second Dialect, 2011
Multiple-choice tests such as the Vocabulary Levels Test (VLT) are often viewed as a preferable estimator of vocabulary knowledge when compared to yes/no checklists, because self-reporting tests introduce the possibility of students overreporting or underreporting scores. However, multiple-choice tests have their own unique disadvantages. It has…
Descriptors: Guessing (Tests), Scoring Formulas, Multiple Choice Tests, Test Reliability

Zimmerman, Donald W. – Educational and Psychological Measurement, 1972
Although a great deal of attention has been devoted over a period of years to the estimation of reliability from item statistics, there are still gaps in the mathematical derivation of the Kuder-Richardson results. The main purpose of this paper is to fill some of these gaps, using language consistent with modern probability theory. (Author)
Descriptors: Mathematical Applications, Probability, Scoring Formulas, Statistical Analysis
Berk, Ronald A. – 1980
Seventeen statistics for measuring the reliability of criterion-referenced tests were critically reviewed. The review was organized into two sections: (1) a discussion of preliminary considerations to provide a foundation for choosing the appropriate category of "reliability" (threshold loss function, squared-error loss-function, or…
Descriptors: Criterion Referenced Tests, Cutting Scores, Scoring Formulas, Statistical Analysis
Kane, Michael T.; Moloney, James M. – 1974
Gilman and Ferry have shown that when the student's score on a multiple choice test is the total number of responses necessary to get all items correct, substantial increases in reliability can occur. In contrast, similar procedures giving partial credit on multiple choice items have resulted in relatively small gains in reliability. The analysis…
Descriptors: Feedback, Guessing (Tests), Multiple Choice Tests, Response Style (Tests)

Marks, Edmond; Martin, Charles G. – American Educational Research Journal, 1973
Purpose of this study was to examine the effects of the true change-true initial score correlation on one aspect of the true simple change estimate, namely its error variance. (Authors/CB)
Descriptors: Analysis of Variance, Mathematical Applications, Measurement Techniques, Scoring Formulas

Frederiksen, Norman; Ward, William C. – Applied Psychological Measurement, 1978
A set of Tests of Scientific Thinking were developed for possible use as criterion measures in research on creativity. Scores on the tests describe both quality and quantity of ideas produced in formulating hypotheses, evaluating proposals, solving methodological problems, and devising methods for measuring constructs. (Author/CTM)
Descriptors: Creativity Tests, Higher Education, Item Sampling, Predictive Validity
Bruno, James E. – Journal of Computer-Based Instruction, 1987
Reports preliminary findings of a study which used a modified Admissible Probability Measurement (APM) test scoring system in the design of computer based instructional management systems. The use of APM for curriculum analysis is discussed, as well as its value in enhancing individualized learning. (Author/LRW)
Descriptors: Computer Assisted Testing, Computer Managed Instruction, Curriculum Evaluation, Design
Kobrin, Jennifer L.; Kimmel, Ernest W. – College Board, 2006
Based on statistics from the first few administrations of the SAT writing section, the test is performing as expected. The reliability of the writing section is very similar to that of other writing assessments. Based on preliminary validity research, the writing section is expected to add modestly to the prediction of college performance when…
Descriptors: Test Construction, Writing Tests, Cognitive Tests, College Entrance Examinations
Cross, Lawrence H. – 1975
A novel scoring procedure was investigated in order to obtain scores from a conventional multiple-choice test that would be free of the guessing component or contain a known guessing component even though examinees were permitted to guess at will. Scores computed with the experimental procedure are based not only on the number of items answered…
Descriptors: Algebra, Comparative Analysis, Guessing (Tests), High Schools
Bayuk, Robert J. – 1973
An investigation was conducted to determine the effects of response-category weighting and item weighting on reliability and predictive validity. Response-category weighting refers to scoring in which, for each category (including omit and "not read"), a weight is assigned that is proportional to the mean criterion score of examinees selecting…
Descriptors: Aptitude Tests, Correlation, Predictive Validity, Research Reports
Donlon, Thomas F. – 1975
This study empirically determined the optimizing weight to be applied to the Wrongs Total Score in scoring rubrics of the general form = R - kW, where S is the Score, R the Rights Total, k the weight and W the Wrongs Total, if reliability is to be maximized. As is well known, the traditional formula score rests on a theoretical framework which is…
Descriptors: Achievement Tests, Comparative Analysis, Guessing (Tests), Multiple Choice Tests
Shuford, Emir H., Jr.; Brown, Thomas A. – 1974
A student's choice of an answer to a test question is a coarse measure of his knowledge about the subject matter of the question. Much finer measurement might be achieved if the student were asked to estimate, for each possible answer, the probability that it is the correct one. Such a procedure could yield two classes of benefits: (a) students…
Descriptors: Bias, Computer Programs, Confidence Testing, Decision Making
Rippey, Robert M. – 1971
Technical improvements, which may be made in the reliability and validity of tests through confidence scores, are discussed. However, studies indicate that subjects do not handle their confidence uniformly. (MS)
Descriptors: Computer Programs, Confidence Testing, Correlation, Difficulty Level
Haladyna, Thomas – 1975
A central problem for the user of domain-referenced tests in instruction is deciding who has passed and who has failed. Two procedures were presented and discussed. The first, employing classical test theory, was found to be more useful for larger domains and where the passing standard is 70 percent or less. The sampling procedure suggested by…
Descriptors: Academic Achievement, Academic Standards, Criterion Referenced Tests, Decision Making Skills
Previous Page | Next Page »
Pages: 1 | 2