NotesFAQContact Us
Collection
Advanced
Search Tips
Showing all 11 results Save | Export
Peer reviewed Peer reviewed
Direct linkDirect link
Taylor, Catherine S.; Lee, Yoonsun – Applied Measurement in Education, 2010
Item response theory (IRT) methods are generally used to create score scales for large-scale tests. Research has shown that IRT scales are stable across groups and over time. Most studies have focused on items that are dichotomously scored. Now Rasch and other IRT models are used to create scales for tests that include polytomously scored items.…
Descriptors: Measures (Individuals), Item Response Theory, Robustness (Statistics), Item Analysis
Peer reviewed Peer reviewed
Direct linkDirect link
Bolt, Sara E.; Ysseldyke, James E. – Applied Measurement in Education, 2006
Although testing accommodations are commonly provided to students with disabilities within large-scale testing programs, research findings on how well accommodations allow for comparable measurement of student knowledge and skill remain inconclusive. The purpose of this study was to examine the extent to which 1 commonly held belief about testing…
Descriptors: Oral Reading, Testing Accommodations, Disabilities, Special Needs Students
Peer reviewed Peer reviewed
Hambleton, Ronald K.; Rogers, H. Jane – Applied Measurement in Education, 1989
Item Response Theory and Mantel-Haenszel approaches for investigating differential item performance were compared to assess the level of agreement of the approaches in identifying potentially biased items. Subjects were 2,000 White and 2,000 Native American high school students. The Mantel-Haenszel method provides an acceptable approximation of…
Descriptors: American Indians, Comparative Testing, High School Students, High Schools
Peer reviewed Peer reviewed
Schaefer, Lyn; And Others – Applied Measurement in Education, 1992
Studied methods for structuring a performance domain for a certification test in emergency nursing based on task frequency ratings from 659 emergency nurses or task similarity ratings from 21 experts. A 125-job analysis survey was used. Similarity judgment results are more easily interpreted and adequately modeled by multivariate analysis. (SLD)
Descriptors: Certification, Comparative Testing, Job Analysis, Licensing Examinations (Professions)
Peer reviewed Peer reviewed
Frary, Robert B. – Applied Measurement in Education, 1991
The use of the "none-of-the-above" option (NOTA) in 20 college-level multiple-choice tests was evaluated for classes with 100 or more students. Eight academic disciplines were represented, and 295 NOTA and 724 regular test items were used. It appears that the NOTA can be compatible with good classroom measurement. (TJH)
Descriptors: College Students, Comparative Testing, Difficulty Level, Discriminant Analysis
Peer reviewed Peer reviewed
Stone, Clement A.; Lane, Suzanne – Applied Measurement in Education, 1991
A model-testing approach for evaluating the stability of item response theory item parameter estimates (IPEs) in a pretest-posttest design is illustrated. Nineteen items from the Head Start Measures Battery were used. A moderately high degree of stability in the IPEs for 5,510 children assessed on 2 occasions was found. (TJH)
Descriptors: Comparative Testing, Compensatory Education, Computer Assisted Testing, Early Childhood Education
Peer reviewed Peer reviewed
Haladyna, Thomas A. – Applied Measurement in Education, 1992
Several multiple-choice item formats are examined in the current climate of test reform. The reform movement is discussed as it affects use of the following formats: (1) complex multiple-choice; (2) alternate choice; (3) true-false; (4) multiple true-false; and (5) the context dependent item set. (SLD)
Descriptors: Cognitive Psychology, Comparative Testing, Context Effect, Educational Change
Peer reviewed Peer reviewed
Royer, James M.; Carlo, Maria S. – Applied Measurement in Education, 1991
Measures of linguistic competence for limited-English-proficient students are discussed. The results for 134 students in grades 3 through 6 from a study of the reliability and validity of the Sentence Verification Technique tests as measures of listening and reading comprehension performance in native languages and English are reported. (TJH)
Descriptors: Bilingual Education, Comparative Testing, Elementary Education, Elementary School Students
Peer reviewed Peer reviewed
Rogers, W. Todd; Bateson, David J. – Applied Measurement in Education, 1991
The influence of test wiseness on the performance of 736 high school seniors in British Columbia on provincial school leaving examinations in English, algebra, geography, history, biology, and chemistry was studied. The performance of many students on the multiple-choice sections was spuriously enhanced by test wiseness. (TJH)
Descriptors: Comparative Testing, Foreign Countries, Grade 12, Graduation Requirements
Peer reviewed Peer reviewed
Forsyth, Robert A.; And Others – Applied Measurement in Education, 1992
Eighth grade teachers in three local school districts helped customize two standardized norm-referenced tests for ninth graders to investigate effects of deleting some items and adding locally constructed items. Results indicate that percentile ranks for the customized tests could be very different from those for the complete test. (SLD)
Descriptors: Adaptive Testing, Comparative Testing, Elementary Secondary Education, Grade 9
Peer reviewed Peer reviewed
Davey, Beth; Macready, George B. – Applied Measurement in Education, 1990
The usefulness of latent class modeling in addressing several measurement issues is demonstrated via a study of 74 good and 74 poor readers in grades 5 and 6. Procedures were particularly useful for assessing the hierarchical relation among skills and for exploring issues related to item domains. (SLD)
Descriptors: Comparative Testing, Elementary School Students, Grade 5, Grade 6