NotesFAQContact Us
Collection
Advanced
Search Tips
Showing 6,256 to 6,270 of 9,552 results Save | Export
Peer reviewed Peer reviewed
Sciarone, A. G.; Schoorl, J. J. – Language Learning, 1989
Presents findings from an experiment that sought to determine the minimal number of blanks required to ensure parallelism in cloze tests, differing only in the point at which deletion starts. Results showed the required minimum depended on the scoring methods used, with exact-word tests requiring about 100 blanks and acceptable-word tests…
Descriptors: Cloze Procedure, Dutch, Indonesian, Reading Tests
Peer reviewed Peer reviewed
Liou, Michelle – Applied Psychological Measurement, 1988
In applying I. I. Bejar's method for detecting the dimensionality of achievement tests, researchers should be cautious in interpreting the slope of the principal axis. Other information from the data is needed in conjunction with Bejar's method of addressing item dimensionality. (SLD)
Descriptors: Achievement Tests, Computer Simulation, Difficulty Level, Equated Scores
Peer reviewed Peer reviewed
Baker, Frank B. – Applied Psychological Measurement, 1988
The form of item log-likelihood surface was investigated under two-parameter and three-parameter logistic models. Results confirm that the LOGIST program procedures used to locate the maximum of the likelihood functions are consistent with the form of the item log-likelihood surface. (SLD)
Descriptors: Estimation (Mathematics), Factor Analysis, Graphs, Latent Trait Theory
Peer reviewed Peer reviewed
Wilcox, Rand R.; And Others – Journal of Educational Measurement, 1988
The second response conditional probability model of decision-making strategies used by examinees answering multiple choice test items was revised. Increasing the number of distractors or providing distractors giving examinees (N=106) the option to follow the model improved results and gave a good fit to data for 29 of 30 items. (SLD)
Descriptors: Cognitive Tests, Decision Making, Mathematical Models, Multiple Choice Tests
Peer reviewed Peer reviewed
Lin, Miao-Hsiang; Hsiung, Chao A. – Psychometrika, 1994
Two simple empirical approximate Bayes estimators are introduced for estimating domain scores under binomial and hypergeometric distributions respectively. Criteria are established regarding use of these functions over maximum likelihood estimation counterparts. (SLD)
Descriptors: Adaptive Testing, Bayesian Statistics, Computation, Equations (Mathematics)
Peer reviewed Peer reviewed
Hancock, Gregory R.; And Others – Educational and Psychological Measurement, 1993
Two-option multiple-choice vocabulary test items are compared with comparably written true-false test items. Results from a study with 111 high school students suggest that multiple-choice items provide a significantly more reliable measure than the true-false format. (SLD)
Descriptors: Ability, High School Students, High Schools, Objective Tests
Peer reviewed Peer reviewed
Hamp-Lyons, Liz; Mathias, Sheila Prochnow – Journal of Second Language Writing, 1994
Expert judgments of prompt difficulty in essay tests were examined to discover whether they could be used at the item-writing stage of test development. Findings show that "expert judges" share considerable agreement about prompt difficulty and prompt task type, but they cannot predict which prompts will result in high or low scores for…
Descriptors: Cues, English (Second Language), Essay Tests, Language Tests
Peer reviewed Peer reviewed
Millman, Jason – Educational Measurement: Issues and Practice, 1994
The unfulfilled promise of criterion-referenced measurement is that it would permit valid inferences about what a student could and could not do. To come closest to achieving all that criterion-referenced testing originally promised, tests of higher item density, with more items per amount of domain, are required. (SLD)
Descriptors: Criterion Referenced Tests, Educational History, Inferences, Norm Referenced Tests
Peer reviewed Peer reviewed
Meijer, Rob R.; And Others – Applied Psychological Measurement, 1994
The power of the nonparametric person-fit statistic, U3, is investigated through simulations as a function of item characteristics, test characteristics, person characteristics, and the group to which examinees belong. Results suggest conditions under which relatively short tests can be used for person-fit analysis. (SLD)
Descriptors: Difficulty Level, Group Membership, Item Response Theory, Nonparametric Statistics
Peer reviewed Peer reviewed
Otter, Martha E.; And Others – Journal of Educational Measurement, 1995
The ability of 2 components, interpretation of a question and memory, to forecast the test-retest association coefficients of reading test items was studied with initial samples of 916 elementary and 949 secondary school students. For both populations, both components forecast the relative sizes of test-retest correlation coefficients. (SLD)
Descriptors: Cognitive Processes, Comprehension, Correlation, Elementary School Students
Peer reviewed Peer reviewed
Hetter, Rebecca D.; And Others – Applied Psychological Measurement, 1994
Effects on computerized adaptive test score of using a paper-and-pencil (P&P) calibration to select items and estimate scores were compared with effects of using computer calibration. Results with 2,999 Navy recruits support the use of item parameters calibrated from either P&P or computer administrations. (SLD)
Descriptors: Adaptive Testing, Comparative Analysis, Computer Assisted Testing, Estimation (Mathematics)
Peer reviewed Peer reviewed
Van der Ven, Ad H. G. S. – Educational and Psychological Measurement, 1992
The dichotomous Rasch model was applied to verbal subtest scores on the Intelligence Structure Test Battery for 905 12- to 15-year-old secondary school students in the Netherlands. Results suggest that, if any factor is used to increase difficulty of items, that factor should be used on all items. (SLD)
Descriptors: Difficulty Level, Foreign Countries, Intelligence Tests, Secondary Education
Vance, Booney; Sabatino, David – Diagnostique, 1991
The issues of construct validity, predictive validity, and item content bias on the Wechsler Intelligence Scale for Children-Revised (WISC-R) are examined. The review concludes that most objective data have not supported the issue of bias of the WISC-R when used with children of different ethnic backgrounds. (JDD)
Descriptors: Construct Validity, Content Validity, Elementary Secondary Education, Ethnic Groups
Peer reviewed Peer reviewed
Sireci, Stephen G.; Geisinger, Kurt F. – Applied Psychological Measurement, 1992
A new method for evaluating the content representation of a test is illustrated. Item similarity ratings were obtained from three content domain experts to assess whether ratings corresponded to item groupings specified in the test blueprint. Multidimensional scaling and cluster analysis provided substantial information about the test's content…
Descriptors: Cluster Analysis, Content Analysis, Multidimensional Scaling, Multiple Choice Tests
Peer reviewed Peer reviewed
Weir, C. J.; And Others – Reading in a Foreign Language, 1990
Presents critical analysis of an earlier article, and argues that, although the validity of the High/Low distinction is questionable, it is possible for practical testing purposes to obtain reliable judgments from properly selected and trained judges. (seven references) (GLR)
Descriptors: Evaluation Methods, Reading Comprehension, Reading Tests, Second Language Learning
Pages: 1  |  ...  |  414  |  415  |  416  |  417  |  418  |  419  |  420  |  421  |  422  |  ...  |  637