Publication Date
In 2025 | 0 |
Since 2024 | 0 |
Since 2021 (last 5 years) | 1 |
Since 2016 (last 10 years) | 3 |
Since 2006 (last 20 years) | 3 |
Descriptor
Test Format | 9 |
Test Validity | 9 |
Test Reliability | 4 |
Comparative Analysis | 3 |
Test Construction | 3 |
Test Items | 3 |
Computer Simulation | 2 |
Item Analysis | 2 |
Multiple Choice Tests | 2 |
Academic Standards | 1 |
Achievement Tests | 1 |
More ▼ |
Source
Journal of Educational… | 9 |
Author
Benson, Jeri | 1 |
Chang, Hua-Hua | 1 |
Douglas, Jeff | 1 |
Dwyer, Andrew C. | 1 |
Frary, Robert B. | 1 |
Hocevar, Dennis | 1 |
Jaeger, Richard M. | 1 |
Joiner, Lee M. | 1 |
Lin, Haiyan | 1 |
Shear, Benjamin R. | 1 |
Simon, Alan J. | 1 |
More ▼ |
Publication Type
Journal Articles | 9 |
Reports - Research | 8 |
Reports - Descriptive | 1 |
Reports - Evaluative | 1 |
Education Level
Secondary Education | 1 |
Audience
Location
Laws, Policies, & Programs
Assessments and Surveys
Graduate Record Examinations | 1 |
Peabody Picture Vocabulary… | 1 |
Program for International… | 1 |
SAT (College Admission Test) | 1 |
What Works Clearinghouse Rating
Shear, Benjamin R. – Journal of Educational Measurement, 2023
Large-scale standardized tests are regularly used to measure student achievement overall and for student subgroups. These uses assume tests provide comparable measures of outcomes across student subgroups, but prior research suggests score comparisons across gender groups may be complicated by the type of test items used. This paper presents…
Descriptors: Gender Bias, Item Analysis, Test Items, Achievement Tests
Wang, Shiyu; Lin, Haiyan; Chang, Hua-Hua; Douglas, Jeff – Journal of Educational Measurement, 2016
Computerized adaptive testing (CAT) and multistage testing (MST) have become two of the most popular modes in large-scale computer-based sequential testing. Though most designs of CAT and MST exhibit strength and weakness in recent large-scale implementations, there is no simple answer to the question of which design is better because different…
Descriptors: Computer Assisted Testing, Adaptive Testing, Test Format, Sequential Approach
Dwyer, Andrew C. – Journal of Educational Measurement, 2016
This study examines the effectiveness of three approaches for maintaining equivalent performance standards across test forms with small samples: (1) common-item equating, (2) resetting the standard, and (3) rescaling the standard. Rescaling the standard (i.e., applying common-item equating methodology to standard setting ratings to account for…
Descriptors: Cutting Scores, Equivalency Tests, Test Format, Academic Standards

Simon, Alan J.; Joiner, Lee M. – Journal of Educational Measurement, 1976
The purpose of this study was to determine whether a Mexican version of the Peabody Picture Vocabulary Test could be improved by directly translating both forms of the American test, then using decision procedures to select the better item of each pair. The reliability of the simple translations suffered. (Author/BW)
Descriptors: Early Childhood Education, Spanish, Test Construction, Test Format

Ward, William C.; And Others – Journal of Educational Measurement, 1980
Free response and machine-scorable versions of a test called Formulating Hypotheses were compared with respect to construct validity. Results indicate that the different forms involve different cognitive processes and measure different qualities. (Author/JKS)
Descriptors: Cognitive Processes, Cognitive Tests, Higher Education, Personality Traits

Jaeger, Richard M.; Wolf, Marian B. – Journal of Educational Measurement, 1982
The effectiveness of a Likert-scale and three paired-choice presentation formats in discriminating among parents' preferences for curriculum elements were compared. Paired-choice formats gave more reliable discriminations which increased with stimulus specificity. Similarities and differences in preference orderings are discussed. (Author/CM)
Descriptors: Comparative Analysis, Elementary Education, Parent Attitudes, Parent School Relationship

Frary, Robert B. – Journal of Educational Measurement, 1985
Responses to a sample test were simulated for examinees under free-response and multiple-choice formats. Test score sets were correlated with randomly generated sets of unit-normal measures. The extent of superiority of free response tests was sufficiently small so that other considerations might justifiably dictate format choice. (Author/DWH)
Descriptors: Comparative Analysis, Computer Simulation, Essay Tests, Guessing (Tests)

Benson, Jeri; Hocevar, Dennis – Journal of Educational Measurement, 1985
Three rating scales--with all positive or all negative wording, or a mixture of both--were administered to 522 children in grades four through six. The results indicated that it was difficult for students to indicate agreement by disagreeing with a negative statement. This affected test validity. Author/GDC)
Descriptors: Attitude Measures, Elementary School Students, Intermediate Grades, Item Analysis

Stricker, Lawrence J. – Journal of Educational Measurement, 1991
To study whether different forms of the Scholastic Aptitude Test (SAT) used since the mid-1970s varied in their correlations with academic performance criteria, 1975 and 1985 forms were administered to 1,554 and 1,753 high school juniors, respectively. The 1975 form did not have greater validity than the 1985 form. (SLD)
Descriptors: Class Rank, College Entrance Examinations, Comparative Testing, Correlation