NotesFAQContact Us
Collection
Advanced
Search Tips
Showing 5,161 to 5,175 of 9,547 results Save | Export
Sykes, Robert C.; Ito, Kyoko – 1998
A common procedure for obtaining multiple readings (ratings) for a constructed response item, especially in high-stakes tests, is to have two readers read the papers independently, with a third reading if the results differ by more than one point. This necessitates a scoring rule that specifies how the ratings will be aggregated into a single item…
Descriptors: Ability, Constructed Response, High Stakes Tests, Judges
Lee, Guemin; Frisbie, David A. – 1997
Previous studies have indicated that the reliability of test scores composed of testlets might be overestimated by conventional item-based reliability estimation methods (R. Thorndike, 1953; A. Anastasi, 1988; S. Sireci, D. Thissen, and H. Wainer, 1991; H. Wainer and D. Thissen, 1996). This study used generalizability theory to investigate the…
Descriptors: Estimation (Mathematics), Generalizability Theory, Reliability, Scores
Martinez, Michael E.; Simpson, R. Scott – 1999
Item-level statistics from ability and achievement tests have been underutilized as sources of data for building models of cognitive development. How item data can be used to build a cognitive-developmental map of proportional reasoning is demonstrated. The product of the analysis is a cognitive hierarchy with levels corresponding to categories of…
Descriptors: Ability, Achievement Tests, Cognitive Development, Cognitive Tests
Mazor, Kathleen M.; And Others – 1991
The Mantel-Haenszel (MH) procedure has become one of the most popular procedures for detecting differential item functioning. Valid results with relatively small numbers of examinees represent one of the advantages typically attributed to this procedure. In this study, examinee item responses were simulated to contain differentially functioning…
Descriptors: Difficulty Level, Item Bias, Item Response Theory, Sample Size
Peer reviewed Peer reviewed
PDF on ERIC Download full text
Attali, Yigal – ETS Research Report Series, 2004
Contrary to common belief, reliability estimates of number-right multiple-choice tests are not inflated by speededness. Because examinees guess on questions when they run out of time, the responses to these questions show less consistency with the responses of other questions, and the reliability of the test will be decreased. The surprising…
Descriptors: Multiple Choice Tests, Timed Tests, Test Reliability, Guessing (Tests)
Peer reviewed Peer reviewed
PDF on ERIC Download full text
Haberman, Shelby J. – ETS Research Report Series, 2004
The usefulness of joint and conditional maximum-likelihood is considered for the Rasch model under realistic testing conditions in which the number of examinees is very large and the number is items is relatively large. Conditions for consistency and asymptotic normality are explored, effects of model error are investigated, measures of prediction…
Descriptors: Maximum Likelihood Statistics, Computation, Item Response Theory, Testing
Whitney, Douglas R.; And Others – 1985
This preview of the Tests of General Educational Development (GED) to be introduced in 1988 begins with a brief background of the review process that will result in the GED Test. An overview of committee recommendations then highlights five themes of Test Specifications Committee panel reports: the tests should (1) demand more highly developed…
Descriptors: Adult Education, High School Equivalency Programs, Test Format, Test Items
Larsen, Gary Y. – 1984
The paper describes the reasons for developing a new instrument to measure adaptive behavior of mentally retarded residents at Glenwood State Hospital-School and recounts the processes involved in constructing the new scale. Among complaints about the American Association on Mental Deficiency Adaptive Behavior Scale (ABS) are its inappropriateness…
Descriptors: Adaptive Behavior (of Disabled), Factor Analysis, Mental Retardation, Test Construction
Scheuneman, Janice Dowd – 1982
The connection between item bias and test scores was investigated using a simulation approach. Two samples of hypothetical examinees were simulated using an item response theory model. The two samples were identical, except that the mean theta value 1 sample was 5 less than the other. The simulated tests consisted of 50 items with characteristics…
Descriptors: Latent Trait Theory, Research Methodology, Research Problems, Simulation
Peer reviewed Peer reviewed
Washington, William N.; Godfrey, R. Richard – Journal of Educational Measurement, 1974
Item statistics between illustrated and written items drawn from the same content areas were compared using F ratios. The results indicated: that illustrated items performed slightly better than matched written items; and that the best performing category of illustrated items was tables. (Author/BB)
Descriptors: Achievement Tests, Illustrations, Test Construction, Test Items
New York State Education Dept., Albany. Bureau of Home Economics Education. – 1981
In the last of a series of three publications for improving assessment of student achievement in home economics courses, modules of learning objectives and related test items are provided to help teachers develop more valid and reliable measurement materials and assessment procedures. The samples are geared to state-approved courses of study. The…
Descriptors: Achievement Tests, Family Life Education, Home Economics, Occupational Home Economics
Huntley, Renee M.; Loyd, Brenda H. – 1982
The study investigated the effect of "item density" on item difficulty in passage-related language tests. Item density refers to the number and frequency of items in relation to clear text. The format used was a passage with underlinings to signal the language situations out of which the items were constructed. American College Testing…
Descriptors: Difficulty Level, Language Tests, Secondary Education, Test Construction
Willson, Victor L. – 1977
A major deficiency in classical test theory is the reliance on Pearson product-moment (PPM) correlation concepts in the definition of reliability. PPM measures are totally insensitive to first moment differences in tests which leads to the dubious assumption of essential tan-equivalence. Robinson proposed a measure of agreement that is sensitive…
Descriptors: Comparative Analysis, Correlation, Difficulty Level, Mathematical Formulas
Peer reviewed Peer reviewed
Huck, Schuyler W. – Educational and Psychological Measurement, 1978
Hoyt's analysis of variance procedure for estimating reliability assumes that the residual mean square estimates error variability. If, however, an individual's true score varies across items, it is argued that residual mean square estimates two components--error and interaction--and hence Winer's modification of Hoyt's formula, understimates the…
Descriptors: Analysis of Variance, Item Analysis, Psychometrics, Test Interpretation
Peer reviewed Peer reviewed
Greitzer, Samuel L. – Mathematics Teacher, 1978
This discussion of the results of the mathematical Olympiad contains the average scores on each of the five problems, the winner of the Olympiad, and the Olympiad problems. (MP)
Descriptors: Mathematical Enrichment, Secondary Education, Secondary School Mathematics, Test Items
Pages: 1  |  ...  |  341  |  342  |  343  |  344  |  345  |  346  |  347  |  348  |  349  |  ...  |  637