NotesFAQContact Us
Collection
Advanced
Search Tips
Showing all 13 results Save | Export
Peer reviewed Peer reviewed
Direct linkDirect link
Isbell, Dan; Winke, Paula – Language Testing, 2019
The American Council on the Teaching of Foreign Languages (ACTFL) oral proficiency interview -- computer (OPIc) testing system represents an ambitious effort in language assessment: Assessing oral proficiency in over a dozen languages, on the same scale, from virtually anywhere at any time. Especially for users in contexts where multiple foreign…
Descriptors: Oral Language, Language Tests, Language Proficiency, Second Language Learning
Peer reviewed Peer reviewed
Direct linkDirect link
Camilli, Gregory – Educational Research and Evaluation, 2013
In the attempt to identify or prevent unfair tests, both quantitative analyses and logical evaluation are often used. For the most part, fairness evaluation is a pragmatic attempt at determining whether procedural or substantive due process has been accorded to either a group of test takers or an individual. In both the individual and comparative…
Descriptors: Alternative Assessment, Test Bias, Test Content, Test Format
Peer reviewed Peer reviewed
Qualls, Audrey L. – Applied Measurement in Education, 1995
Classically parallel, tau-equivalently parallel, and congenerically parallel models representing various degrees of part-test parallelism and their appropriateness for tests composed of multiple item formats are discussed. An appropriate reliability estimate for a test with multiple item formats is presented and illustrated. (SLD)
Descriptors: Achievement Tests, Estimation (Mathematics), Measurement Techniques, Test Format
Schulz, E. Matthew; Wang, Lin – 2001
In this study, items were drawn from a full-length test of 30 items in order to construct shorter tests for the purpose of making accurate pass/fail classifications with regard to a specific criterion point on the latent ability metric. A three-item parameter Item Response Theory (IRT) framework was used. The criterion point on the latent ability…
Descriptors: Ability, Classification, Item Response Theory, Pass Fail Grading
Peer reviewed Peer reviewed
Axelrod, Bradley N.; And Others – Psychological Assessment, 1996
The calculations of D. Schretlen, R. H. B. Benedict, and J. H. Bobholz for the reliabilities of a short form of the Wechsler Adult Intelligence Scale--Revised (WAIS-R) (1994) consistently overestimated the values. More accurate values are provided for the WAIS--R and a seven-subtest short form. (SLD)
Descriptors: Error Correction, Error of Measurement, Estimation (Mathematics), Intelligence Tests
Peer reviewed Peer reviewed
Smith, Renee L.; And Others – Psychological Assessment, 1995
The clinical utility of using fewer than 12 trials of the Selective Reminding Test, a task to assess verbal memory, was studied with 100 cardiac patients and 100 brain injury patients. Results suggest that as few as 6 trials might be adequate, providing information consistent with that from 12 trials. (SLD)
Descriptors: Clinical Diagnosis, Diagnostic Tests, Head Injuries, Memory
Stocking, Martha L. – 1994
As adaptive testing moves toward operational implementation in large scale testing programs, where it is important that adaptive tests be as parallel as possible to existing linear tests, a number of practical issues arise. This paper concerns three such issues. First, optimum item pool size is difficult to determine in advance of pool…
Descriptors: Adaptive Testing, Computer Assisted Testing, Item Banks, Standards
Wainer, Howard; And Others – 1991
A series of computer simulations was run to measure the relationship between testlet validity and the factors of item pool size and testlet length for both adaptive and linearly constructed testlets. Results confirmed the generality of earlier empirical findings of H. Wainer and others (1991) that making a testlet adaptive yields only marginal…
Descriptors: Adaptive Testing, Computer Assisted Testing, Computer Simulation, Item Banks
Mislevy, Robert J.; Wu, Pao-Kuei – 1988
The basic equations of item response theory provide a foundation for inferring examinees' abilities and items' operating characteristics from observed responses. In practice, though, examinees will usually not have provided a response to every available item--for reasons that may or may not have been intended by the test administrator, and that…
Descriptors: Ability, Adaptive Testing, Equations (Mathematics), Estimation (Mathematics)
Peer reviewed Peer reviewed
Arthur, Winfred, Jr.; Day, David V. – Educational and Psychological Measurement, 1994
The development of a short form of the Raven Advanced Progressive Matrices Test is reported. Results from 3 studies with 663 college students indicate that the short form demonstrates psychometric properties similar to the long form yet requires a substantially shorter administration time. (SLD)
Descriptors: Cognitive Ability, College Students, Educational Research, Higher Education
Peer reviewed Peer reviewed
Deville, Craig; O'Neill, Thomas; Wright, Benjamin D.; Woodcock, Richard W.; Munoz-Sandoval, Ana; Gershon, Richard C.; Bergstrom, Betty – Popular Measurement, 1998
Articles in this special section consider (1) flow in test taking (Craig Deville); (2) testwiseness (Thomas O'Neill); (3) test length (Benjamin Wright); (4) cross-language test equating (Richard W. Woodcock and Ana Munoz-Sandoval); (5) computer-assisted testing and testwiseness (Richard Gershon and Betty Bergstrom); and (6) Web-enhanced testing…
Descriptors: Computer Assisted Testing, Educational Testing, Equated Scores, Measurement Techniques
Wainer, Howard; Thissen, David – 1994
When an examination consists in whole or part of constructed response test items, it is common practice to allow the examinee to choose a subset of the constructed response questions from a larger pool. It is sometimes argued that, if choice were not allowed, the limitations on domain coverage forced by the small number of items might unfairly…
Descriptors: Constructed Response, Difficulty Level, Educational Testing, Equated Scores
Wainer, Howard; And Others – 1990
The initial development of a testlet-based algebra test was previously reported (Wainer and Lewis, 1990). This account provides the details of this excursion into the use of hierarchical testlets and validity-based scoring. A pretest of two 15-item hierarchical testlets was carried out in which examinees' performance on a 4-item subset of each…
Descriptors: Adaptive Testing, Algebra, Comparative Analysis, Computer Assisted Testing