NotesFAQContact Us
Collection
Advanced
Search Tips
Showing all 9 results Save | Export
Peer reviewed Peer reviewed
Direct linkDirect link
Shear, Benjamin R. – Journal of Educational Measurement, 2023
Large-scale standardized tests are regularly used to measure student achievement overall and for student subgroups. These uses assume tests provide comparable measures of outcomes across student subgroups, but prior research suggests score comparisons across gender groups may be complicated by the type of test items used. This paper presents…
Descriptors: Gender Bias, Item Analysis, Test Items, Achievement Tests
Peer reviewed Peer reviewed
Direct linkDirect link
Wang, Shiyu; Lin, Haiyan; Chang, Hua-Hua; Douglas, Jeff – Journal of Educational Measurement, 2016
Computerized adaptive testing (CAT) and multistage testing (MST) have become two of the most popular modes in large-scale computer-based sequential testing. Though most designs of CAT and MST exhibit strength and weakness in recent large-scale implementations, there is no simple answer to the question of which design is better because different…
Descriptors: Computer Assisted Testing, Adaptive Testing, Test Format, Sequential Approach
Peer reviewed Peer reviewed
Direct linkDirect link
Dwyer, Andrew C. – Journal of Educational Measurement, 2016
This study examines the effectiveness of three approaches for maintaining equivalent performance standards across test forms with small samples: (1) common-item equating, (2) resetting the standard, and (3) rescaling the standard. Rescaling the standard (i.e., applying common-item equating methodology to standard setting ratings to account for…
Descriptors: Cutting Scores, Equivalency Tests, Test Format, Academic Standards
Peer reviewed Peer reviewed
Simon, Alan J.; Joiner, Lee M. – Journal of Educational Measurement, 1976
The purpose of this study was to determine whether a Mexican version of the Peabody Picture Vocabulary Test could be improved by directly translating both forms of the American test, then using decision procedures to select the better item of each pair. The reliability of the simple translations suffered. (Author/BW)
Descriptors: Early Childhood Education, Spanish, Test Construction, Test Format
Peer reviewed Peer reviewed
Ward, William C.; And Others – Journal of Educational Measurement, 1980
Free response and machine-scorable versions of a test called Formulating Hypotheses were compared with respect to construct validity. Results indicate that the different forms involve different cognitive processes and measure different qualities. (Author/JKS)
Descriptors: Cognitive Processes, Cognitive Tests, Higher Education, Personality Traits
Peer reviewed Peer reviewed
Jaeger, Richard M.; Wolf, Marian B. – Journal of Educational Measurement, 1982
The effectiveness of a Likert-scale and three paired-choice presentation formats in discriminating among parents' preferences for curriculum elements were compared. Paired-choice formats gave more reliable discriminations which increased with stimulus specificity. Similarities and differences in preference orderings are discussed. (Author/CM)
Descriptors: Comparative Analysis, Elementary Education, Parent Attitudes, Parent School Relationship
Peer reviewed Peer reviewed
Frary, Robert B. – Journal of Educational Measurement, 1985
Responses to a sample test were simulated for examinees under free-response and multiple-choice formats. Test score sets were correlated with randomly generated sets of unit-normal measures. The extent of superiority of free response tests was sufficiently small so that other considerations might justifiably dictate format choice. (Author/DWH)
Descriptors: Comparative Analysis, Computer Simulation, Essay Tests, Guessing (Tests)
Peer reviewed Peer reviewed
Benson, Jeri; Hocevar, Dennis – Journal of Educational Measurement, 1985
Three rating scales--with all positive or all negative wording, or a mixture of both--were administered to 522 children in grades four through six. The results indicated that it was difficult for students to indicate agreement by disagreeing with a negative statement. This affected test validity. Author/GDC)
Descriptors: Attitude Measures, Elementary School Students, Intermediate Grades, Item Analysis
Peer reviewed Peer reviewed
Stricker, Lawrence J. – Journal of Educational Measurement, 1991
To study whether different forms of the Scholastic Aptitude Test (SAT) used since the mid-1970s varied in their correlations with academic performance criteria, 1975 and 1985 forms were administered to 1,554 and 1,753 high school juniors, respectively. The 1975 form did not have greater validity than the 1985 form. (SLD)
Descriptors: Class Rank, College Entrance Examinations, Comparative Testing, Correlation