Publication Date
In 2025 | 0 |
Since 2024 | 0 |
Since 2021 (last 5 years) | 1 |
Since 2016 (last 10 years) | 2 |
Since 2006 (last 20 years) | 3 |
Descriptor
Comparative Analysis | 17 |
Test Validity | 17 |
Multiple Choice Tests | 5 |
Test Reliability | 5 |
Item Analysis | 4 |
Test Items | 4 |
Achievement Tests | 3 |
Confidence Testing | 3 |
Guessing (Tests) | 3 |
Scores | 3 |
Scoring Formulas | 3 |
More ▼ |
Source
Journal of Educational… | 17 |
Author
Hakstian, A. Ralph | 2 |
Kansup, Wanlop | 2 |
Baldwin, Peter | 1 |
Bucak, Deniz | 1 |
Clauser, Brian E. | 1 |
Cohen, Allan S. | 1 |
Crehan, Kevin D. | 1 |
Ebel, Robert L. | 1 |
Farr, Roger | 1 |
Frary, Robert B. | 1 |
Grabovsky, Irina | 1 |
More ▼ |
Publication Type
Journal Articles | 9 |
Reports - Research | 8 |
Speeches/Meeting Papers | 2 |
Reports - Evaluative | 1 |
Education Level
Secondary Education | 1 |
Audience
Location
Laws, Policies, & Programs
Assessments and Surveys
College and University… | 1 |
National Teacher Examinations | 1 |
Program for International… | 1 |
What Works Clearinghouse Rating
Shear, Benjamin R. – Journal of Educational Measurement, 2023
Large-scale standardized tests are regularly used to measure student achievement overall and for student subgroups. These uses assume tests provide comparable measures of outcomes across student subgroups, but prior research suggests score comparisons across gender groups may be complicated by the type of test items used. This paper presents…
Descriptors: Gender Bias, Item Analysis, Test Items, Achievement Tests
Harik, Polina; Clauser, Brian E.; Grabovsky, Irina; Baldwin, Peter; Margolis, Melissa J.; Bucak, Deniz; Jodoin, Michael; Walsh, William; Haist, Steven – Journal of Educational Measurement, 2018
Test administrators are appropriately concerned about the potential for time constraints to impact the validity of score interpretations; psychometric efforts to evaluate the impact of speededness date back more than half a century. The widespread move to computerized test delivery has led to the development of new approaches to evaluating how…
Descriptors: Comparative Analysis, Observation, Medical Education, Licensing Examinations (Professions)
Tendeiro, Jorge N.; Meijer, Rob R. – Journal of Educational Measurement, 2014
In recent guidelines for fair educational testing it is advised to check the validity of individual test scores through the use of person-fit statistics. For practitioners it is unclear on the basis of the existing literature which statistic to use. An overview of relatively simple existing nonparametric approaches to identify atypical response…
Descriptors: Educational Assessment, Test Validity, Scores, Statistical Analysis

Crehan, Kevin D. – Journal of Educational Measurement, 1974
Various item selection techniques are compared on criterion-referenced reliability and validity. Techniques compared include three nominal criterion-referenced methods, a traditional point biserial selection, teacher selection, and random selection. (Author)
Descriptors: Comparative Analysis, Criterion Referenced Tests, Item Analysis, Item Banks

Ebel, Robert L. – Journal of Educational Measurement, 1975
Descriptors: Comparative Analysis, Multiple Choice Tests, Objective Tests, Teachers

Hartnett, Rodney T. – Journal of Educational Measurement, 1971
Alternative scoring methods yield essentially the same information, including scale intercorrelations and validity. Reasons for preferring the traditional psychometric scoring technique are offered. (Author/AG)
Descriptors: College Environment, Comparative Analysis, Correlation, Item Analysis

Wardrop, James L.; And Others – Journal of Educational Measurement, 1982
A structure for describing different approaches to testing is generated by identifying five dimensions along which tests differ: test uses, item generation, item revision, assessment of precision, and validation. These dimensions are used to profile tests of reading comprehension. Only norm-referenced achievement tests had an inference system…
Descriptors: Achievement Tests, Comparative Analysis, Educational Testing, Models

Medley, Donald M.; Quirk, Thomas J. – Journal of Educational Measurement, 1974
Descriptors: Blacks, Comparative Analysis, Culture Fair Tests, Item Analysis

Farr, Roger; Roelke, Patricia – Journal of Educational Measurement, 1971
Descriptors: Classroom Observation Techniques, Comparative Analysis, Measurement Techniques, Rating Scales

Koehler, Roger A. – Journal of Educational Measurement, 1971
Descriptors: Achievement Tests, Comparative Analysis, Confidence Testing, Grade 11

Jaeger, Richard M.; Wolf, Marian B. – Journal of Educational Measurement, 1982
The effectiveness of a Likert-scale and three paired-choice presentation formats in discriminating among parents' preferences for curriculum elements were compared. Paired-choice formats gave more reliable discriminations which increased with stimulus specificity. Similarities and differences in preference orderings are discussed. (Author/CM)
Descriptors: Comparative Analysis, Elementary Education, Parent Attitudes, Parent School Relationship

Frary, Robert B. – Journal of Educational Measurement, 1985
Responses to a sample test were simulated for examinees under free-response and multiple-choice formats. Test score sets were correlated with randomly generated sets of unit-normal measures. The extent of superiority of free response tests was sufficiently small so that other considerations might justifiably dictate format choice. (Author/DWH)
Descriptors: Comparative Analysis, Computer Simulation, Essay Tests, Guessing (Tests)

Kim, Seock-Ho; Cohen, Allan S. – Journal of Educational Measurement, 1992
Effects of the following methods for linking metrics on detection of differential item functioning (DIF) were compared: (1) test characteristic curve method (TCC); (2) weighted mean and sigma method; and (3) minimum chi-square method. With large samples, results were essentially the same. With small samples, TCC was most accurate. (SLD)
Descriptors: Chi Square, Comparative Analysis, Equations (Mathematics), Estimation (Mathematics)

Moss, Pamela A.; And Others – Journal of Educational Measurement, 1982
Scores on a multiple-choice language test involving recognition of language errors were related to those on writing samples, scored atomistically for the same language errors and holistically for communicative effectiveness and correctness. Results suggest the need for clear limits in generalizing from one assessment to others. (Author/GK)
Descriptors: Comparative Analysis, Elementary Secondary Education, Evaluation Methods, Grade 10

Kansup, Wanlop; Hakstian, A. Ralph – Journal of Educational Measurement, 1975
Effects of logically weighting incorrect item options in conventional tests and different scoring functions with confidence tests on reliability and validity were examined. Ninth graders took conventionally administered Verbal and Mathematical Reasoning tests, scored conventionally and by a procedure assigning degree-of-correctness weights to…
Descriptors: Comparative Analysis, Confidence Testing, Junior High School Students, Multiple Choice Tests
Previous Page | Next Page ยป
Pages: 1 | 2