ERIC - Search Results

Publication Date

In 2025	0
Since 2024	0
Since 2021 (last 5 years)	1
Since 2016 (last 10 years)	2
Since 2006 (last 20 years)	3

Descriptor

Comparative Analysis	17
Test Validity	17
Multiple Choice Tests	5
Test Reliability	5
Item Analysis	4
Test Items	4
Achievement Tests	3
Confidence Testing	3
Guessing (Tests)	3
Scores	3
Scoring Formulas	3
Test Format	3
Computer Simulation	2
Evaluation Methods	2
Junior High School Students	2
Objective Tests	2
Rating Scales	2
Response Style (Tests)	2
Responses	2
Scoring	2
Statistical Analysis	2
Test Use	2
Testing	2
Adaptive Testing	1
Adults	1
More ▼

Source

Journal of Educational…

Publication Type

Journal Articles	9
Reports - Research	8
Speeches/Meeting Papers	2
Reports - Evaluative	1

Education Level

Secondary Education

Audience

Location

Laws, Policies, & Programs

Assessments and Surveys

College and University…	1
National Teacher Examinations	1
Program for International…	1

What Works Clearinghouse Rating

Showing 1 to 15 of 17 results Save | Export

Gender Bias in Test Item Formats: Evidence from PISA 2009, 2012, and 2015 Math and Reading Tests

Peer reviewed

Direct link

Shear, Benjamin R. – Journal of Educational Measurement, 2023

Large-scale standardized tests are regularly used to measure student achievement overall and for student subgroups. These uses assume tests provide comparable measures of outcomes across student subgroups, but prior research suggests score comparisons across gender groups may be complicated by the type of test items used. This paper presents…

Descriptors: Gender Bias, Item Analysis, Test Items, Achievement Tests

A Comparison of Experimental and Observational Approaches to Assessing the Effects of Time Constraints in a Medical Licensing Examination

Peer reviewed

Direct link

Harik, Polina; Clauser, Brian E.; Grabovsky, Irina; Baldwin, Peter; Margolis, Melissa J.; Bucak, Deniz; Jodoin, Michael; Walsh, William; Haist, Steven – Journal of Educational Measurement, 2018

Test administrators are appropriately concerned about the potential for time constraints to impact the validity of score interpretations; psychometric efforts to evaluate the impact of speededness date back more than half a century. The widespread move to computerized test delivery has led to the development of new approaches to evaluating how…

Descriptors: Comparative Analysis, Observation, Medical Education, Licensing Examinations (Professions)

Detection of Invalid Test Scores: The Usefulness of Simple Nonparametric Statistics

Peer reviewed

Direct link

Tendeiro, Jorge N.; Meijer, Rob R. – Journal of Educational Measurement, 2014

In recent guidelines for fair educational testing it is advised to check the validity of individual test scores through the use of person-fit statistics. For practitioners it is unclear on the basis of the existing literature which statistic to use. An overview of relatively simple existing nonparametric approaches to identify atypical response…

Descriptors: Educational Assessment, Test Validity, Scores, Statistical Analysis

Item Analysis for Teacher-Made Mastery Tests

Peer reviewed

Crehan, Kevin D. – Journal of Educational Measurement, 1974

Various item selection techniques are compared on criterion-referenced reliability and validity. Techniques compared include three nominal criterion-referenced methods, a traditional point biserial selection, teacher selection, and random selection. (Author)

Descriptors: Comparative Analysis, Criterion Referenced Tests, Item Analysis, Item Banks

Can Teachers Write Good True-False Test Items?

Peer reviewed

Ebel, Robert L. – Journal of Educational Measurement, 1975

Descriptors: Comparative Analysis, Multiple Choice Tests, Objective Tests, Teachers

A Note on the Comparability of Alternative Scoring Methods for the Institutional Functioning Inventory

Peer reviewed

Hartnett, Rodney T. – Journal of Educational Measurement, 1971

Alternative scoring methods yield essentially the same information, including scale intercorrelations and validity. Reasons for preferring the traditional psychometric scoring technique are offered. (Author/AG)

Descriptors: College Environment, Comparative Analysis, Correlation, Item Analysis

A Framework for Analyzing the Inference Structure of Educational Achievement Tests.

Peer reviewed

Wardrop, James L.; And Others – Journal of Educational Measurement, 1982

A structure for describing different approaches to testing is generated by identifying five dimensions along which tests differ: test uses, item generation, item revision, assessment of precision, and validation. These dimensions are used to profile tests of reading comprehension. Only norm-referenced achievement tests had an inference system…

Descriptors: Achievement Tests, Comparative Analysis, Educational Testing, Models

The Application of a Factorial Design to the Study of Cultural Bias in General Culture Items on the National Teacher Examination

Peer reviewed

Medley, Donald M.; Quirk, Thomas J. – Journal of Educational Measurement, 1974

Descriptors: Blacks, Comparative Analysis, Culture Fair Tests, Item Analysis

Measuring Subskills of Reading: Intercorrelations Between Standardized Reading Tests, Teachers' Ratings, and Reading Specialists' Ratings

Peer reviewed

Farr, Roger; Roelke, Patricia – Journal of Educational Measurement, 1971

Descriptors: Classroom Observation Techniques, Comparative Analysis, Measurement Techniques, Rating Scales

A Comparison of the Validities of Conventional Choice Testing and Various Confidence Marking Procedures

Peer reviewed

Koehler, Roger A. – Journal of Educational Measurement, 1971

Descriptors: Achievement Tests, Comparative Analysis, Confidence Testing, Grade 11

The Effect of Stimulus Format on Discriminability in School Surveys.

Peer reviewed

Jaeger, Richard M.; Wolf, Marian B. – Journal of Educational Measurement, 1982

The effectiveness of a Likert-scale and three paired-choice presentation formats in discriminating among parents' preferences for curriculum elements were compared. Paired-choice formats gave more reliable discriminations which increased with stimulus specificity. Similarities and differences in preference orderings are discussed. (Author/CM)

Descriptors: Comparative Analysis, Elementary Education, Parent Attitudes, Parent School Relationship

Multiple-Choice versus Free-Response: A Simulation Study.

Peer reviewed

Frary, Robert B. – Journal of Educational Measurement, 1985

Responses to a sample test were simulated for examinees under free-response and multiple-choice formats. Test score sets were correlated with randomly generated sets of unit-normal measures. The extent of superiority of free response tests was sufficiently small so that other considerations might justifiably dictate format choice. (Author/DWH)

Descriptors: Comparative Analysis, Computer Simulation, Essay Tests, Guessing (Tests)

Effects of Linking Methods on Detection of DIF.

Peer reviewed

Kim, Seock-Ho; Cohen, Allan S. – Journal of Educational Measurement, 1992

Effects of the following methods for linking metrics on detection of differential item functioning (DIF) were compared: (1) test characteristic curve method (TCC); (2) weighted mean and sigma method; and (3) minimum chi-square method. With large samples, results were essentially the same. With small samples, TCC was most accurate. (SLD)

Descriptors: Chi Square, Comparative Analysis, Equations (Mathematics), Estimation (Mathematics)

A Comparison of Procedures to Assess Written Language Skills at Grades 4, 7, and 10.

Peer reviewed

Moss, Pamela A.; And Others – Journal of Educational Measurement, 1982

Scores on a multiple-choice language test involving recognition of language errors were related to those on writing samples, scored atomistically for the same language errors and holistically for communicative effectiveness and correctness. Results suggest the need for clear limits in generalizing from one assessment to others. (Author/GK)

Descriptors: Comparative Analysis, Elementary Secondary Education, Evaluation Methods, Grade 10

A Comparison of Several Methods of Assessing Partial Knowledge in Multiple Choice Tests: I. Scoring Procedures

Peer reviewed

Kansup, Wanlop; Hakstian, A. Ralph – Journal of Educational Measurement, 1975

Effects of logically weighting incorrect item options in conventional tests and different scoring functions with confidence tests on reliability and validity were examined. Ninth graders took conventionally administered Verbal and Mathematical Reasoning tests, scored conventionally and by a procedure assigning degree-of-correctness weights to…

Descriptors: Comparative Analysis, Confidence Testing, Junior High School Students, Multiple Choice Tests

Previous Page | Next Page »

Pages: 1 | 2

Hakstian, A. Ralph	2
Kansup, Wanlop	2
Baldwin, Peter	1
Bucak, Deniz	1
Clauser, Brian E.	1
Cohen, Allan S.	1
Crehan, Kevin D.	1
Ebel, Robert L.	1
Farr, Roger	1
Frary, Robert B.	1
Grabovsky, Irina	1
Haist, Steven	1
Harik, Polina	1
Hartnett, Rodney T.	1
Jaeger, Richard M.	1
Jodoin, Michael	1
Kim, Seock-Ho	1
Koehler, Roger A.	1
Margolis, Melissa J.	1
Medley, Donald M.	1
Meijer, Rob R.	1
Moss, Pamela A.	1
Quirk, Thomas J.	1
Roelke, Patricia	1
More ▼