Publication Date
| In 2026 | 0 |
| Since 2025 | 0 |
| Since 2022 (last 5 years) | 4 |
| Since 2017 (last 10 years) | 38 |
| Since 2007 (last 20 years) | 132 |
Descriptor
Source
Author
Publication Type
Education Level
Audience
| Researchers | 12 |
| Practitioners | 10 |
| Community | 5 |
| Parents | 5 |
| Teachers | 3 |
| Policymakers | 2 |
Location
| Florida | 7 |
| United Kingdom | 6 |
| United Kingdom (England) | 6 |
| Australia | 5 |
| Canada | 5 |
| United States | 5 |
| Georgia | 3 |
| New York | 3 |
| North Carolina | 3 |
| Turkey | 3 |
| California | 2 |
| More ▼ | |
Laws, Policies, & Programs
| Elementary and Secondary… | 3 |
| No Child Left Behind Act 2001 | 3 |
| Education for All Handicapped… | 1 |
| Individuals with Disabilities… | 1 |
| Serrano v Priest | 1 |
Assessments and Surveys
What Works Clearinghouse Rating
Peer reviewedKoch, William R.; Dodd, Barbara G. – Applied Measurement in Education, 1989
Various aspects of the computerized adaptive testing (CAT) procedure for partial credit scoring were manipulated, focusing on the effects of the manipulations on operational characteristics of the CAT. The effects of item-pool size, item-pool information, and stepsizes used along the trait continuum were assessed. (TJH)
Descriptors: Adaptive Testing, Computer Assisted Testing, Item Banks, Maximum Likelihood Statistics
Peer reviewedOltman, Phillip K.; Stricker, Lawrence J. – Language Testing, 1990
A recent multidimensional scaling analysis of the Test of English-as-a-Foreign-Language (TOEFL) item response data identified clusters of items in the test sections that, being more homogeneous than their parent sections, might be better for diagnostic use. The analysis was repeated using different scoring techniques. Results diverged only for…
Descriptors: English (Second Language), Item Analysis, Language Tests, Scaling
Peer reviewedAchenbach, Thomas M.; McConaughy, Stephanie H. – School Psychology Review, 1996
Presents similarities and differences between the DSM-IV and empirically based approaches to behavioral and emotional problems. A case example illustrates the applications of the two approaches to school-based assessment. (Author/JDM)
Descriptors: Behavior Problems, Case Studies, Elementary Secondary Education, Emotional Problems
Love, Gayle A. – 1987
In a review of relevant literature, it is argued that correction for guessing formulas should not be used. It is contended that such formulas correct for guessing that does not really exist in a noticeable amount, penalize those students who have low self-esteem and self-confidence, correct for errors that are not necessarily errors, benefit risk…
Descriptors: Guessing (Tests), Scoring Formulas, Self Esteem, Teacher Made Tests
Peer reviewedOlejnik, Stephen; Porter, Andrew C. – Educational and Psychological Measurement, 1975
The four scoring strategies compared were: lamda coefficients, chi-square weights, and two applications of multiple discriminant analysis. No significant differences were found when applied to the Kuder Occupational Interest Survey. (RC)
Descriptors: Analysis of Variance, Comparative Analysis, Discriminant Analysis, Interest Inventories
Livingston, Samuel A.; Kastrinos, William – 1982
Leo Nedelsky developed a method for determining absolute grading standards for multiple choice tests. His method required a group of judges to examine each test question and eliminate those responses which the lowest D- student should be able to reject as incorrect. The correct answer probabilities remaining were used in computing an expected test…
Descriptors: Cutting Scores, Judges, Multiple Choice Tests, Real Estate
Budescu, David V. – 1979
This paper outlines a technique for differentially weighting options of a multiple choice test in a fashion that maximizes the item predictive validity. The rule can be applied with different number of categories and the "optimal" number of categories can be determined by significance tests and/or through the R2 criterion. Our theoretical analysis…
Descriptors: Multiple Choice Tests, Predictive Validity, Scoring Formulas, Test Items
Berk, Ronald A. – 1980
Seventeen statistics for measuring the reliability of criterion-referenced tests were critically reviewed. The review was organized into two sections: (1) a discussion of preliminary considerations to provide a foundation for choosing the appropriate category of "reliability" (threshold loss function, squared-error loss-function, or…
Descriptors: Criterion Referenced Tests, Cutting Scores, Scoring Formulas, Statistical Analysis
Marco, Gary L. – 1975
A method of interpolation has been derived that should be superior to linear interpolation in computing the percentile ranks of test scores for unimodal score distributions. The superiority of the logistic interpolation over the linear interpolation is most noticeable for distributions consisting of only a small number of score intervals (say…
Descriptors: Comparative Analysis, Intervals, Mathematical Models, Percentage
Arneklev, Bruce; And Others – 1976
One of the most important contentions of the Rasch model of item analysis is that two tests of the same trait, having some items in common, can be linked together using a "linking constant" derived from the common items. This would be accomplished by administering both tests to a sample of testees, calibrating the items of the tests…
Descriptors: Elementary School Mathematics, Goodness of Fit, Item Analysis, Measurement Techniques
Peer reviewedSher, Lawrence – Two-Year College Mathematics Journal, 1977
A formula for converting raw test scores to refined, more meaningful scores is presented. Formula scores are easily computed. (SD)
Descriptors: College Mathematics, Educational Testing, Higher Education, Mathematics Education
Peer reviewedAamodt, Michael G.; Pierce, Walter L., Jr. – Educational and Psychological Measurement, 1987
Data from five separate samples were weighted using the vertical percent method (England) and the rare response method (Telenson, Alexander, and Barrett) to investigate their relative effectiveness for scoring biographical information blanks. Vertical percent scoring yielded significant validity coefficients for all samples, while rare response…
Descriptors: Biographical Inventories, Employees, Job Performance, Predictive Validity
Peer reviewedWaters, Brian K. – Journal of Educational Research, 1976
This pilot study compared two empirically-derived, option-weighting methods and the resultant effect on the reliability and validity of multiple choice test scores as compared with conventional rights-only scoring. (MM)
Descriptors: Guessing (Tests), Measurement, Multiple Choice Tests, Scoring
Peer reviewedFeuerman, Martin; Weiss, Harvey – Management Science, 1973
A model is presented for test construction and scoring that utilizes the knapsack model of mathematical programing. The method applies to examinations of the type in which a choice exists in the number of questions the examinee is required to answer. The method has been utilized with respect to a mathematics examination, and computer-generated…
Descriptors: Computer Oriented Programs, Mathematics, Models, Scoring
Peer reviewedGordon, Leonard V. – Educational and Psychological Measurement, 1971
Results indicate that extremeness response sets at the two ends of the continuum differentially contribute to scale validity. (MS)
Descriptors: Attitude Measures, Rating Scales, Response Style (Tests), Scoring Formulas


