ERIC - Search Results

Publication Date

In 2025	3
Since 2024	5
Since 2021 (last 5 years)	5
Since 2016 (last 10 years)	6
Since 2006 (last 20 years)	8

Descriptor

Test Reliability	40
Test Validity	19
Testing	15
Testing Problems	13
Test Construction	12
Multiple Choice Tests	9
Test Interpretation	8
Test Items	6
Error of Measurement	5
Higher Education	5
Response Style (Tests)	5
Test Bias	5
Achievement Tests	4
Confidence Testing	4
Evaluation Methods	4
Scoring	4
Scoring Formulas	4
Test Reviews	4
Comparative Analysis	3
Comparative Testing	3
Computer Assisted Testing	3
Correlation	3
Criterion Referenced Tests	3
Decision Making	3
Educational Testing	3
More ▼

Source

Journal of Educational…

Publication Type

Journal Articles	24
Reports - Research	14
Reports - Evaluative	4
Information Analyses	2
Book/Product Reviews	1
Guides - Non-Classroom	1
Opinion Papers	1
Speeches/Meeting Papers	1
Tests/Questionnaires	1

Education Level

Higher Education	2
Postsecondary Education	2
Secondary Education	1

Audience

Practitioners	2
Researchers	1

Location

Laws, Policies, & Programs

Assessments and Surveys

Program for International…	1
System of Multicultural…	1
Test of Standard Written…	1

What Works Clearinghouse Rating

Showing 1 to 15 of 40 results Save | Export

Using Automated Procedures to Score Educational Essays Written in Three Languages

Peer reviewed

Direct link

Tahereh Firoozi; Hamid Mohammadi; Mark J. Gierl – Journal of Educational Measurement, 2025

The purpose of this study is to describe and evaluate a multilingual automated essay scoring (AES) system for grading essays in three languages. Two different sentence embedding models were evaluated within the AES system, multilingual BERT (mBERT) and language-agnostic BERT sentence embedding (LaBSE). German, Italian, and Czech essays were…

Descriptors: College Students, Slavic Languages, German, Italian

Modeling the Intraindividual Relation of Ability and Speed within a Test

Peer reviewed

Direct link

Augustin Mutak; Robert Krause; Esther Ulitzsch; Sören Much; Jochen Ranger; Steffi Pohl – Journal of Educational Measurement, 2024

Understanding the intraindividual relation between an individual's speed and ability in testing scenarios is essential to assure a fair assessment. Different approaches exist for estimating this relationship, that either rely on specific study designs or on specific assumptions. This paper aims to add to the toolbox of approaches for estimating…

Descriptors: Testing, Academic Ability, Time on Task, Correlation

Detecting Differential Item Functioning among Multiple Groups Using IRT Residual DIF Framework

Peer reviewed

Direct link

Hwanggyu Lim; Danqi Zhu; Edison M. Choe; Kyung T. Han – Journal of Educational Measurement, 2024

This study presents a generalized version of the residual differential item functioning (RDIF) detection framework in item response theory, named GRDIF, to analyze differential item functioning (DIF) in multiple groups. The GRDIF framework retains the advantages of the original RDIF framework, such as computational efficiency and ease of…

Descriptors: Item Response Theory, Test Bias, Test Reliability, Test Construction

Evaluating the Consistency and Reliability of Attribution Methods in Automated Short Answer Grading (ASAG) Systems: Toward an Explainable Scoring System

Peer reviewed

Direct link

Wallace N. Pinto Jr.; Jinnie Shin – Journal of Educational Measurement, 2025

In recent years, the application of explainability techniques to automated essay scoring and automated short-answer grading (ASAG) models, particularly those based on transformer architectures, has gained significant attention. However, the reliability and consistency of these techniques remain underexplored. This study systematically investigates…

Descriptors: Automation, Grading, Computer Assisted Testing, Scoring

Using Multilabel Neural Network to Score High-Dimensional Assessments for Different Use Foci: An Example with College Major Preference Assessment

Peer reviewed

Direct link

Shun-Fu Hu; Amery D. Wu; Jake Stone – Journal of Educational Measurement, 2025

Scoring high-dimensional assessments (e.g., > 15 traits) can be a challenging task. This paper introduces the multilabel neural network (MNN) as a scoring method for high-dimensional assessments. Additionally, it demonstrates how MNN can score the same test responses to maximize different performance metrics, such as accuracy, recall, or…

Descriptors: Tests, Testing, Scores, Test Construction

Hybrid Computerized Adaptive Testing: From Group Sequential Design to Fully Sequential Design

Peer reviewed

Direct link

Wang, Shiyu; Lin, Haiyan; Chang, Hua-Hua; Douglas, Jeff – Journal of Educational Measurement, 2016

Computerized adaptive testing (CAT) and multistage testing (MST) have become two of the most popular modes in large-scale computer-based sequential testing. Though most designs of CAT and MST exhibit strength and weakness in recent large-scale implementations, there is no simple answer to the question of which design is better because different…

Descriptors: Computer Assisted Testing, Adaptive Testing, Test Format, Sequential Approach

Attribute-Level and Pattern-Level Classification Consistency and Accuracy Indices for Cognitive Diagnostic Assessment

Peer reviewed

Direct link

Wang, Wenyi; Song, Lihong; Chen, Ping; Meng, Yaru; Ding, Shuliang – Journal of Educational Measurement, 2015

Classification consistency and accuracy are viewed as important indicators for evaluating the reliability and validity of classification results in cognitive diagnostic assessment (CDA). Pattern-level classification consistency and accuracy indices were introduced by Cui, Gierl, and Chang. However, the indices at the attribute level have not yet…

Descriptors: Classification, Reliability, Accuracy, Cognitive Tests

Item Response Theory Models for Performance Decline during Testing

Peer reviewed

Direct link

Jin, Kuan-Yu; Wang, Wen-Chung – Journal of Educational Measurement, 2014

Sometimes, test-takers may not be able to attempt all items to the best of their ability (with full effort) due to personal factors (e.g., low motivation) or testing conditions (e.g., time limit), resulting in poor performances on certain items, especially those located toward the end of a test. Standard item response theory (IRT) models fail to…

Descriptors: Student Evaluation, Item Response Theory, Models, Simulation

Standards for Educational & Psychological Tests

Peer reviewed

Lennon, Roger T. – Journal of Educational Measurement, 1975

Reviews the 1974 Standards, an updating serving as a guide to test making and publishing, and training of persons for these endeavors. (DEP)

Descriptors: Educational Testing, Psychological Testing, Scoring, Standards

Effects of Testing Conditions on Piaget Matrices and Order of Appearance Problems: A Study of Competence versus Performance.

Peer reviewed

Carlson, Jerry S.; Dillon, Ronna – Journal of Educational Measurement, 1979

The Matrices and Order of Appearance subtests of a Piagetian test battery were administered to a sample of second-grade children on two occasions under two test conditions: standardized testing and a dialogue between child and examiner. Differences for test condition and time of testing were found. (JKS)

Descriptors: Academic Achievement, Developmental Psychology, Developmental Stages, Individual Testing

The Relationship Between Verbal-Meaning Test Scores and Degree of Confidence in Item Responses

Peer reviewed

Wen, Shih-Sung – Journal of Educational Measurement, 1975

The relationship between students' scores on a verbal meaning test and their degrees of confidence in item responses was investigated. Subjects were black undergraduate students and they were administered a verbal meaning test by following a confidence testing procedure. (Author/BJG)

Descriptors: Blacks, Confidence Testing, Higher Education, Language Skills

Technical Issues in Large-Scale Performance Assessment [Book Review].

Peer reviewed

Sykes, Robert C.; Ito, Kyoko; Fitzpatrick, Anne R.; Ercikan, Kadriye – Journal of Educational Measurement, 1997

The five chapters of this report provide resources that deal with the validity, generalizability, comparability, performance standards, and fairness, equity, and bias of performance assessments. The book is written for experienced educational measurement practitioners, although an extensive familiarity with performance assessment is not required.…

Descriptors: Educational Assessment, Measurement Techniques, Performance Based Assessment, Standards

A Study of Hypotheses Basic to the Use of Rights and Formula Scores.

Peer reviewed

Angoff, William H.; Schrader, William B. – Journal of Educational Measurement, 1984

The reported data provide a basis for evaluating the formula-scoring versus rights-scoring issue and for assessing the effects of directions on the reliability and parallelism of scores for sophisticated examinees taking professionally developed tests. Results support the invariance hypothesis rather than the differential effects hypothesis.…

Descriptors: College Entrance Examinations, Guessing (Tests), Higher Education, Hypothesis Testing

A Practitioner's Guide to Computation and Interpretation of Reliability Indices for Mastery Tests.

Peer reviewed

Subkoviak, Michael J. – Journal of Educational Measurement, 1988

Current methods for obtaining reliability indices for mastery tests can be laborious. This paper offers practitioners tables from which agreement and kappa coefficients can be read directly and provides criterion for acceptable values of agreement and kappa coefficients. (TJH)

Descriptors: Mastery Tests, Statistical Analysis, Test Reliability, Testing

Standards and Criteria.

Peer reviewed

Glass, Gene V. – Journal of Educational Measurement, 1978

A detailed analysis of standard setting and criteria for test scores and educational decisions is presented. The author contends that present procedures are in need of re-examination. (JKS)

Descriptors: Academic Standards, Behavioral Objectives, Criterion Referenced Tests, Decision Making

Previous Page | Next Page »

Pages: 1 | 2 | 3

Fitzpatrick, Anne R.	2
Hakstian, A. Ralph	2
Kansup, Wanlop	2
Subkoviak, Michael J.	2
Amery D. Wu	1
Angoff, William H.	1
Askegaard, Lewis D.	1
Augustin Mutak	1
Bashaw, W. L.	1
Breland, Hunter M.	1
Budescu, David	1
Burton, Nancy W.	1
Carlson, Jerry S.	1
Chang, Hua-Hua	1
Chen, Ping	1
Danqi Zhu	1
Diamond, James J.	1
Dillon, Ronna	1
Ding, Shuliang	1
Douglas, Jeff	1
Edison M. Choe	1
Ercikan, Kadriye	1
Esther Ulitzsch	1
Ganopole, Selina J.	1
Gaynor, Judith L.	1
More ▼