ERIC - Search Results

Publication Date

In 2025	3
Since 2024	3
Since 2021 (last 5 years)	3
Since 2016 (last 10 years)	7
Since 2006 (last 20 years)	8

Descriptor

Test Reliability	67
Test Validity	67
Test Construction	26
Multiple Choice Tests	19
Criterion Referenced Tests	11
Item Analysis	11
Test Interpretation	8
Achievement Tests	7
Test Reviews	7
Testing Problems	7
Guessing (Tests)	6
Higher Education	6
Response Style (Tests)	6
Scoring Formulas	6
Test Items	6
Testing	6
Comparative Analysis	5
Correlation	5
Error of Measurement	5
Evaluation Methods	5
Measurement	5
Scoring	5
Tables (Data)	5
Test Bias	5
Aptitude Tests	4
More ▼

Source

Journal of Educational…

Publication Type

Journal Articles	32
Reports - Research	23
Reports - Evaluative	5
Information Analyses	2
Opinion Papers	2
Reports - Descriptive	2
Book/Product Reviews	1
Speeches/Meeting Papers	1
Tests/Questionnaires	1

Education Level

Higher Education	2
Postsecondary Education	2

Audience

Practitioners	2
Researchers	2

Location

Australia	1
Jordan	1

Laws, Policies, & Programs

Assessments and Surveys

Classroom Environment Scale	1
Differential Aptitude Test	1
My Class Inventory	1
Peabody Picture Vocabulary…	1
Remote Associates Test	1
Self Description Questionnaire	1
Stanford Achievement Tests	1
System of Multicultural…	1
Test of Standard Written…	1
Wechsler Intelligence Scale…	1

What Works Clearinghouse Rating

Showing 1 to 15 of 67 results Save | Export

Using Automated Procedures to Score Educational Essays Written in Three Languages

Peer reviewed

Direct link

Tahereh Firoozi; Hamid Mohammadi; Mark J. Gierl – Journal of Educational Measurement, 2025

The purpose of this study is to describe and evaluate a multilingual automated essay scoring (AES) system for grading essays in three languages. Two different sentence embedding models were evaluated within the AES system, multilingual BERT (mBERT) and language-agnostic BERT sentence embedding (LaBSE). German, Italian, and Czech essays were…

Descriptors: College Students, Slavic Languages, German, Italian

A Note on the Use of Categorical Subscores

Peer reviewed

Direct link

Kylie Gorney; Sandip Sinharay – Journal of Educational Measurement, 2025

Although there exists an extensive amount of research on subscores and their properties, limited research has been conducted on categorical subscores and their interpretations. In this paper, we focus on the claim of Feinberg and von Davier that categorical subscores are useful for remediation and instructional purposes. We investigate this claim…

Descriptors: Tests, Scores, Test Interpretation, Alternative Assessment

Using Multilabel Neural Network to Score High-Dimensional Assessments for Different Use Foci: An Example with College Major Preference Assessment

Peer reviewed

Direct link

Shun-Fu Hu; Amery D. Wu; Jake Stone – Journal of Educational Measurement, 2025

Scoring high-dimensional assessments (e.g., > 15 traits) can be a challenging task. This paper introduces the multilabel neural network (MNN) as a scoring method for high-dimensional assessments. Additionally, it demonstrates how MNN can score the same test responses to maximize different performance metrics, such as accuracy, recall, or…

Descriptors: Tests, Testing, Scores, Test Construction

Nonparametric Evidence of Validity, Reliability, and Fairness for Rater-Mediated Assessments: An Illustration Using Mokken Scale Analysis

Peer reviewed

Direct link

Wind, Stefanie A. – Journal of Educational Measurement, 2019

Numerous researchers have proposed methods for evaluating the quality of rater-mediated assessments using nonparametric methods (e.g., kappa coefficients) and parametric methods (e.g., the many-facet Rasch model). Generally speaking, popular nonparametric methods for evaluating rating quality are not based on a particular measurement theory. On…

Descriptors: Nonparametric Statistics, Test Validity, Test Reliability, Item Response Theory

Can We Learn from Student Mistakes in a Formative, Reading Comprehension Assessment?

Peer reviewed

Direct link

Liu, Bowen; Kennedy, Patrick C.; Seipel, Ben; Carlson, Sarah E.; Biancarosa, Gina; Davison, Mark L. – Journal of Educational Measurement, 2019

This article describes an ongoing project to develop a formative, inferential reading comprehension assessment of causal story comprehension. It has three features to enhance classroom use: equated scale scores for progress monitoring within and across grades, a scale score to distinguish among low-scoring students based on patterns of mistakes,…

Descriptors: Formative Evaluation, Reading Comprehension, Story Reading, Test Construction

Hybrid Computerized Adaptive Testing: From Group Sequential Design to Fully Sequential Design

Peer reviewed

Direct link

Wang, Shiyu; Lin, Haiyan; Chang, Hua-Hua; Douglas, Jeff – Journal of Educational Measurement, 2016

Computerized adaptive testing (CAT) and multistage testing (MST) have become two of the most popular modes in large-scale computer-based sequential testing. Though most designs of CAT and MST exhibit strength and weakness in recent large-scale implementations, there is no simple answer to the question of which design is better because different…

Descriptors: Computer Assisted Testing, Adaptive Testing, Test Format, Sequential Approach

Maintaining Equivalent Cut Scores for Small Sample Test Forms

Peer reviewed

Direct link

Dwyer, Andrew C. – Journal of Educational Measurement, 2016

This study examines the effectiveness of three approaches for maintaining equivalent performance standards across test forms with small samples: (1) common-item equating, (2) resetting the standard, and (3) rescaling the standard. Rescaling the standard (i.e., applying common-item equating methodology to standard setting ratings to account for…

Descriptors: Cutting Scores, Equivalency Tests, Test Format, Academic Standards

Relating Unidimensional IRT Parameters to a Multidimensional Response Space: A Review of Two Alternative Projection IRT Models for Scoring Subscales

Peer reviewed

Direct link

Kahraman, Nilufer; Thompson, Tony – Journal of Educational Measurement, 2011

A practical concern for many existing tests is that subscore test lengths are too short to provide reliable and meaningful measurement. A possible method of improving the subscale reliability and validity would be to make use of collateral information provided by items from other subscales of the same test. To this end, the purpose of this article…

Descriptors: Test Length, Test Items, Alignment (Education), Models

Incremental Reliability and Validity of Multiple-Choice Tests with an Answer-Until-Correct Procedure

Peer reviewed

Hanna, Gerald S. – Journal of Educational Measurement, 1975

An alternative to the conventional right-wrong scoring method used on multiple-choice tests was presented. In the experiment, the examinee continued to respond to a multiple-choice item until feedback signified a correct answer. Findings showed that experimental scores were more reliable but less valid than inferred conventional scores.…

Descriptors: Feedback, Higher Education, Multiple Choice Tests, Scoring

The Effects of Guttman Weights on the Reliability and Predictive Validity of Objective Tests When Omissions Are Not Differentially Weighted

Peer reviewed

Raffeld, Paul – Journal of Educational Measurement, 1975

Results support the contention that a Guttman-weighted objective test can have psychometric properties that are superior to those of its unweighted counterpart, as long as omissions do not exist or are assigned a value equal to the mean of the k item alternative weights. (Author/BJG)

Descriptors: Multiple Choice Tests, Predictive Validity, Test Reliability, Test Validity

The Relationship Between Number of Response Categories and Reliability of Likert-Type Questionnaires

Peer reviewed

Masters, James R. – Journal of Educational Measurement, 1974

Descriptors: Attitudes, Questionnaires, Rating Scales, Response Style (Tests)

The Issue of Item and Test Variance for Criterion-Referenced Tests

Peer reviewed

Woodson, M. I. Chas. E. – Journal of Educational Measurement, 1974

Descriptors: Criterion Referenced Tests, Item Analysis, Test Construction, Test Reliability

Overconfidence on Probabilistic Tests

Peer reviewed

Koehler, Roger A. – Journal of Educational Measurement, 1974

The purposes of the study were to develop a measure of overconfidence on probabilistic tests, to assess the measurement characteristics of such a measure, and to investigate the relationship of overconfidence on tests to knowledge and to risk-taking propensity. (Author/BB)

Descriptors: Confidence Testing, Measurement Techniques, Multiple Choice Tests, Risk

The Number of Alternatives for Optimum Test Reliability

Peer reviewed

Grier, J. Brown – Journal of Educational Measurement, 1975

The expected reliability of a multiple choice test is maximized by the use of three alternative items. (Author)

Descriptors: Achievement Tests, Multiple Choice Tests, Test Construction, Test Reliability

A Study of the Accuracy of Subkoviak's Single-Administration Estimate of the Coefficient of Agreement Using Two True-Score Estimates

Peer reviewed

Algina, James; Noe, Michael J. – Journal of Educational Measurement, 1978

A computer simulation study was conducted to investigate Subkoviak's index of reliability for criterion-referenced tests, called the coefficient of agreement. Results indicate that the index can be adequately estimated. (JKS)

Descriptors: Criterion Referenced Tests, Mastery Tests, Measurement, Test Reliability

Previous Page | Next Page »

Pages: 1 | 2 | 3 | 4 | 5

Hanna, Gerald S.	3
Fitzpatrick, Anne R.	2
Hakstian, A. Ralph	2
Hambleton, Ronald K.	2
Kansup, Wanlop	2
Whitney, Douglas R.	2
Abeles, Harold F.	1
Airasian, Peter W.	1
Algina, James	1
Amery D. Wu	1
Belfry, M. Joan	1
Beuchert, A. Kent	1
Biancarosa, Gina	1
Board, Cynthia	1
Borich, Gary	1
Brandenburg, Dale C.	1
Breland, Hunter M.	1
Carlson, Sarah E.	1
Chang, Hua-Hua	1
Clark, Philip M.	1
Collet, Leverne S.	1
Crehan, Kevin D.	1
Cross, Lawrence	1
Davison, Mark L.	1
More ▼