ERIC - Search Results

Publication Date

In 2025	0
Since 2024	0
Since 2021 (last 5 years)	0
Since 2016 (last 10 years)	2
Since 2006 (last 20 years)	7

Descriptor

Item Analysis	53
Scoring Formulas	53
Test Reliability	22
Multiple Choice Tests	19
Test Items	19
Guessing (Tests)	13
Scoring	13
Test Construction	13
Test Validity	13
Difficulty Level	11
Weighted Scores	11
Test Interpretation	10
Latent Trait Theory	9
Mathematical Models	9
Measurement Techniques	8
Computer Oriented Programs	7
Confidence Testing	7
Equated Scores	7
Higher Education	7
Response Style (Tests)	7
Correlation	6
Psychometrics	6
Research Reports	6
Testing Problems	6
Achievement Tests	5
More ▼

Source

Applied Psychological…	4
Educational and Psychological…	4
Advances in Health Sciences…	1
Educational Studies in…	1
Educational Testing Service	1
Evaluation in Education:…	1
Journal of Educational…	1
Journal of Technology,…	1
Journal of Vocational Behavior	1
Language Testing	1
National Center for Research…	1
Online Submission	1
Psychometrika	1
Review of Educational Research	1
Spectrum	1
More ▼

Publication Type

Reports - Research	32
Journal Articles	13
Speeches/Meeting Papers	11
Reports - Descriptive	2
Reports - Evaluative	2
Guides - Classroom - Learner	1
Information Analyses	1
Numerical/Quantitative Data	1
Tests/Questionnaires	1

Education Level

Elementary Education	2
Elementary Secondary Education	2
Adult Education	1
Grade 8	1
Higher Education	1
Junior High Schools	1
Middle Schools	1
Secondary Education	1

Audience

Researchers

Location

Denmark	1
India	1
Mississippi	1

Laws, Policies, & Programs

Assessments and Surveys

Test of English as a Foreign…	2
Armed Services Vocational…	1
Graduate Record Examinations	1
Matching Familiar Figures Test	1
SAT (College Admission Test)	1
Stanford Achievement Tests	1
Strong Vocational Interest…	1

What Works Clearinghouse Rating

Showing 1 to 15 of 53 results Save | Export

Multiple True-False Items: A Comparison of Scoring Algorithms

Peer reviewed

Direct link

Lahner, Felicitas-Maria; Lörwald, Andrea Carolin; Bauer, Daniel; Nouns, Zineb Miriam; Krebs, René; Guttormsen, Sissel; Fischer, Martin R.; Huwendiek, Sören – Advances in Health Sciences Education, 2018

Multiple true-false (MTF) items are a widely used supplement to the commonly used single-best answer (Type A) multiple choice format. However, an optimal scoring algorithm for MTF items has not yet been established, as existing studies yielded conflicting results. Therefore, this study analyzes two questions: What is the optimal scoring algorithm…

Descriptors: Scoring Formulas, Scoring Rubrics, Objective Tests, Multiple Choice Tests

Developing, Analyzing, and Using Distractors for Multiple-Choice Tests in Education: A Comprehensive Review

Peer reviewed

Direct link

Gierl, Mark J.; Bulut, Okan; Guo, Qi; Zhang, Xinxin – Review of Educational Research, 2017

Multiple-choice testing is considered one of the most effective and enduring forms of educational assessment that remains in practice today. This study presents a comprehensive review of the literature on multiple-choice testing in education focused, specifically, on the development, analysis, and use of the incorrect options, which are also…

Descriptors: Multiple Choice Tests, Difficulty Level, Accuracy, Error Patterns

Development of Malayalam Handwriting Scale for School Students in Kerala

Download full text

Gafoor, K. Abdul; Naseer, A. R. – Online Submission, 2015

With a view to support instruction, formative and summative assessment and to provide model handwriting performance for students to compare their own performance, a Malayalam handwriting scale is developed. Data from 2640 school students belonging to Malappuram, Palakkad and Kozhikode districts, sampled by taking 240 students per each grade…

Descriptors: Formative Evaluation, Summative Evaluation, Handwriting, Performance Based Assessment

A Note on Item-Restscore Association in Rasch Models

Peer reviewed

Direct link

Kreiner, Svend – Applied Psychological Measurement, 2011

To rule out the need for a two-parameter item response theory (IRT) model during item analysis by Rasch models, it is important to check the Rasch model's assumption that all items have the same item discrimination. Biserial and polyserial correlation coefficients measuring the association between items and restscores are often used in an informal…

Descriptors: Item Analysis, Correlation, Item Response Theory, Models

Aligning Scales of Certification Tests. Research Report. ETS RR-10-07

Download full text

Dorans, Neil J.; Liang, Longjuan; Puhan, Gautam – Educational Testing Service, 2010

Scores are the most visible and widely used products of a testing program. The choice of score scale has implications for test specifications, equating, and test reliability and validity, as well as for test interpretation. At the same time, the score scale should be viewed as infrastructure likely to require repair at some point. In this report…

Descriptors: Testing Programs, Standard Setting (Scoring), Test Interpretation, Certification

Biserial Weights: A New Approach to Test Item Option Weighting

Peer reviewed

Claudy, John G. – Applied Psychological Measurement, 1978

Option weighting is an alternative to increasing test length as a means of improving the reliability of a test. The effects on test reliability of option weighting procedures were compared in two empirical studies using four independent sets of items. Biserial weights were found to be superior. (Author/CTM)

Descriptors: Higher Education, Item Analysis, Scoring Formulas, Test Items

A Comparison of Empirical Differential Option Weighting Scoring Procedures as a Function of Inter-Item Correlation

Peer reviewed

Bejar, Issac I.; Weiss, David J. – Educational and Psychological Measurement, 1977

The reliabilities yielded by several differential option weighting scoring procedures were compared among themselves as well as against conventional testing. It was found that increases in reliability due to differential option weighting were a function of inter-item correlations. Suggestions for the implementation of differential option weighting…

Descriptors: Correlation, Forced Choice Technique, Item Analysis, Scoring Formulas

Developing Homogeneous TOEFL Scales by Multidimensional Scaling.

Peer reviewed

Oltman, Phillip K.; Stricker, Lawrence J. – Language Testing, 1990

A recent multidimensional scaling analysis of the Test of English-as-a-Foreign-Language (TOEFL) item response data identified clusters of items in the test sections that, being more homogeneous than their parent sections, might be better for diagnostic use. The analysis was repeated using different scoring techniques. Results diverged only for…

Descriptors: English (Second Language), Item Analysis, Language Tests, Scaling

Optimum Range of Difficulty for Linking Items.

Arneklev, Bruce; And Others – 1976

One of the most important contentions of the Rasch model of item analysis is that two tests of the same trait, having some items in common, can be linked together using a "linking constant" derived from the common items. This would be accomplished by administering both tests to a sample of testees, calibrating the items of the tests…

Descriptors: Elementary School Mathematics, Goodness of Fit, Item Analysis, Measurement Techniques

Some Results and Comments on Using Latent Structure Models to Measure Achievement.

Peer reviewed

Wilcox, Rand R. – Educational and Psychological Measurement, 1980

Technical problems in achievement testing associated with using latent structure models to estimate the probability of guessing correct responses by examinees is studied; also the lack of problems associated with using Wilcox's formula score. Maximum likelihood estimates are derived which may be applied when items are hierarchically related.…

Descriptors: Guessing (Tests), Item Analysis, Mathematical Models, Maximum Likelihood Statistics

A Method for Increasing the Reliability of a Short Multiple-Choice Test.

Peer reviewed

Serlin, Ronald C.; Kaiser, Henry F. – Educational and Psychological Measurement, 1978

When multiple-choice tests are scored in the usual manner, giving each correct answer one point, information concerning response patterns is lost. A method for utilizing this information is suggested. An example is presented and compared with two conventional methods of scoring. (Author/JKS)

Descriptors: Correlation, Factor Analysis, Item Analysis, Multiple Choice Tests

Grading Distractor-Identification Tests.

Peer reviewed

Austin, Joe Dan – Psychometrika, 1981

On distractor-identification tests students mark as many distractors as possible on each test item. A grading scale is developed for this type testing. The score is optimal in that it yields an unbiased estimate of the student's score as if no guessing had occurred. (Author/JKS)

Descriptors: Guessing (Tests), Item Analysis, Measurement Techniques, Scoring Formulas

Toward More Substantively Meaningful Automated Essay Scoring

Peer reviewed
PDF on ERIC

Download full text

Direct link

Ben-Simon, Anat; Bennett, Randy Elliott – Journal of Technology, Learning, and Assessment, 2007

This study evaluated a "substantively driven" method for scoring NAEP writing assessments automatically. The study used variations of an existing commercial program, e-rater[R], to compare the performance of three approaches to automated essay scoring: a "brute-empirical" approach in which variables are selected and weighted solely according to…

Descriptors: Writing Evaluation, Writing Tests, Scoring, Essays

SVIB Revisions and Factors Affecting Scale Reliability

Peer reviewed

Dolliver, Robert H.; And Others – Journal of Vocational Behavior, 1975

The 1966 SVIB scoring keys were investigated and found to contain an average of 214 fewer items per occupational scale than the 1938 scoring keys. The shorter scales are less reliable than the longer scales on the 1966 SVIB. (Author)

Descriptors: Career Counseling, Interest Inventories, Item Analysis, Occupational Tests

New Directions in Matching Familiar Figures Test Research Resulting From Scoring and Item Analyses.

Download full text

Brinzer, Raymond J. – 1979

The problem engendered by the Matching Familiar Figures (MFF) Test is one of instrument integrity (II). II is delimited by validity, reliability, and utility of MFF as a measure of the reflective-impulsive construct. Validity, reliability and utility of construct assessment may be improved by utilizing: (1) a prototypic scoring model that will…

Descriptors: Conceptual Tempo, Difficulty Level, Item Analysis, Research Methodology

Previous Page | Next Page »

Pages: 1 | 2 | 3 | 4

Weiss, David J.	3
Dorans, Neil J.	2
Echternacht, Gary	2
Vale, C. David	2
Anderson, Frances E.	1
Arneklev, Bruce	1
Atkins, Warren J.	1
Austin, Joe Dan	1
Bauer, Daniel	1
Bejar, Isaac I.	1
Bejar, Issac I.	1
Ben-Simon, Anat	1
Bennett, Randy Elliott	1
Brault, Margueritte B.	1
Brennan, Robert L.	1
Brinzer, Raymond J.	1
Bruno, James E.	1
Bulut, Okan	1
Carlton, Sydell T.	1
Claudy, John G.	1
Divgi, D. R.	1
Dolliver, Robert H.	1
Donlon, Thomas F.	1
Downey, Ronald G.	1
More ▼