ERIC - Search Results

Publication Date

In 2025	1
Since 2024	1
Since 2021 (last 5 years)	2
Since 2016 (last 10 years)	5
Since 2006 (last 20 years)	5

Descriptor

Test Construction	52
Test Validity	46
Test Reliability	26
Test Items	15
Item Analysis	11
Multiple Choice Tests	11
Higher Education	9
Achievement Tests	7
Criterion Referenced Tests	6
Test Interpretation	6
Computer Assisted Testing	5
Scores	5
Test Use	5
Adaptive Testing	4
College Students	4
Educational Testing	4
Item Banks	4
Objective Tests	4
Rating Scales	4
Scoring	4
Standardized Tests	4
Testing	4
Testing Problems	4
Cognitive Processes	3
Comparative Testing	3
More ▼

Source

Journal of Educational…

Publication Type

Journal Articles	33
Reports - Research	21
Reports - Evaluative	7
Reports - Descriptive	4
Information Analyses	2
Speeches/Meeting Papers	2
Opinion Papers	1
Tests/Questionnaires	1

Education Level

Higher Education	3
Postsecondary Education	3

Audience

Researchers

Location

Australia	1
United Kingdom	1

Laws, Policies, & Programs

Assessments and Surveys

Peabody Picture Vocabulary…	2
Classroom Environment Scale	1
Graduate Record Examinations	1
My Class Inventory	1
National Assessment of…	1
Remote Associates Test	1
Self Description Questionnaire	1
System of Multicultural…	1
Test of Standard Written…	1

What Works Clearinghouse Rating

Showing 1 to 15 of 52 results Save | Export

Using Multilabel Neural Network to Score High-Dimensional Assessments for Different Use Foci: An Example with College Major Preference Assessment

Peer reviewed

Direct link

Shun-Fu Hu; Amery D. Wu; Jake Stone – Journal of Educational Measurement, 2025

Scoring high-dimensional assessments (e.g., > 15 traits) can be a challenging task. This paper introduces the multilabel neural network (MNN) as a scoring method for high-dimensional assessments. Additionally, it demonstrates how MNN can score the same test responses to maximize different performance metrics, such as accuracy, recall, or…

Descriptors: Tests, Testing, Scores, Test Construction

Using Eye-Tracking Data as Part of the Validity Argument for Multiple-Choice Questions: A Demonstration

Peer reviewed

Direct link

Yaneva, Victoria; Clauser, Brian E.; Morales, Amy; Paniagua, Miguel – Journal of Educational Measurement, 2021

Eye-tracking technology can create a record of the location and duration of visual fixations as a test-taker reads test questions. Although the cognitive process the test-taker is using cannot be directly observed, eye-tracking data can support inferences about these unobserved cognitive processes. This type of information has the potential to…

Descriptors: Eye Movements, Test Validity, Multiple Choice Tests, Cognitive Processes

Can We Learn from Student Mistakes in a Formative, Reading Comprehension Assessment?

Peer reviewed

Direct link

Liu, Bowen; Kennedy, Patrick C.; Seipel, Ben; Carlson, Sarah E.; Biancarosa, Gina; Davison, Mark L. – Journal of Educational Measurement, 2019

This article describes an ongoing project to develop a formative, inferential reading comprehension assessment of causal story comprehension. It has three features to enhance classroom use: equated scale scores for progress monitoring within and across grades, a scale score to distinguish among low-scoring students based on patterns of mistakes,…

Descriptors: Formative Evaluation, Reading Comprehension, Story Reading, Test Construction

How Developments in Psychology and Technology Challenge Validity Argumentation

Peer reviewed

Direct link

Mislevy, Robert J. – Journal of Educational Measurement, 2016

Validity is the sine qua non of properties of educational assessment. While a theory of validity and a practical framework for validation has emerged over the past decades, most of the discussion has addressed familiar forms of assessment and psychological framings. Advances in digital technologies and in cognitive and social psychology have…

Descriptors: Test Validity, Technology, Cognitive Psychology, Social Psychology

Hybrid Computerized Adaptive Testing: From Group Sequential Design to Fully Sequential Design

Peer reviewed

Direct link

Wang, Shiyu; Lin, Haiyan; Chang, Hua-Hua; Douglas, Jeff – Journal of Educational Measurement, 2016

Computerized adaptive testing (CAT) and multistage testing (MST) have become two of the most popular modes in large-scale computer-based sequential testing. Though most designs of CAT and MST exhibit strength and weakness in recent large-scale implementations, there is no simple answer to the question of which design is better because different…

Descriptors: Computer Assisted Testing, Adaptive Testing, Test Format, Sequential Approach

The Issue of Item and Test Variance for Criterion-Referenced Tests: A Clarification

Peer reviewed

Millman, Jason; Popham, W. James – Journal of Educational Measurement, 1974

The use of the regression equation derived from the Anglo-American sample to predict grades of Mexican-American students resulted in overprediction. An examination of the standardized regression weights revealed a significant difference in the weight given to the Scholastic Aptitude Test Mathematics Score. (Author/BB)

Descriptors: Criterion Referenced Tests, Item Analysis, Predictive Validity, Scores

The Issue of Item and Test Variance for Criterion-Referenced Tests

Peer reviewed

Woodson, M. I. Chas. E. – Journal of Educational Measurement, 1974

Descriptors: Criterion Referenced Tests, Item Analysis, Test Construction, Test Reliability

A Didactic Explanation of Item Bias, Item Impact, and Item Validity from a Multidimensional Perspective.

Peer reviewed

Ackerman, Terry A. – Journal of Educational Measurement, 1992

The difference between item bias and item impact and the way they relate to item validity are discussed from a multidimensional item response theory perspective. The Mantel-Haenszel procedure and the Simultaneous Item Bias strategy are used in a Monte Carlo study to illustrate detection of item bias. (SLD)

Descriptors: Causal Models, Computer Simulation, Construct Validity, Equations (Mathematics)

The Effectiveness of Illustrated Items

Peer reviewed

Washington, William N.; Godfrey, R. Richard – Journal of Educational Measurement, 1974

Item statistics between illustrated and written items drawn from the same content areas were compared using F ratios. The results indicated: that illustrated items performed slightly better than matched written items; and that the best performing category of illustrated items was tables. (Author/BB)

Descriptors: Achievement Tests, Illustrations, Test Construction, Test Items

The Number of Alternatives for Optimum Test Reliability

Peer reviewed

Grier, J. Brown – Journal of Educational Measurement, 1975

The expected reliability of a multiple choice test is maximized by the use of three alternative items. (Author)

Descriptors: Achievement Tests, Multiple Choice Tests, Test Construction, Test Reliability

Improving Construct Validity with Cognitive Psychology Principles.

Peer reviewed

Embretson, Susan; Gorin, Joanna – Journal of Educational Measurement, 2001

Examines testing practices in: (1) the past, in which the traditional paradigm left little room for cognitive psychology principles; (2) the present, in which testing research is enhanced by principles of cognitive psychology; and (3) the future, in which the potential of cognitive psychology should be fully realized through item design.…

Descriptors: Cognitive Psychology, Construct Validity, Educational Research, Educational Testing

Effects of Different Samples on Item and Test Characteristics of Criterion-Referenced Tests

Peer reviewed

Haladyna, Thomas Michael – Journal of Educational Measurement, 1974

Classical test construction and analysis procedures are applicable and appropriate for use with criterion referenced tests when samples of both mastery and nonmastery examinees are employed. (Author/BB)

Descriptors: Criterion Referenced Tests, Item Analysis, Mastery Tests, Test Construction

The Issue of Item and Test Variance for Criterion-Referenced Tests: A Reply

Peer reviewed

Woodson, M. I. Charles E. – Journal of Educational Measurement, 1974

The basis for selection of the calibration sample determines the kind of scale which will be developed. A random sample from a population of individuals leads to a norm-referenced scale, and a sample representative of abilities of a range of characteristics leads to a criterion-referenced scale. (Author/BB)

Descriptors: Criterion Referenced Tests, Discriminant Analysis, Item Analysis, Test Construction

Another Look at "Cultural Fairness"

Peer reviewed

Darlington, Richard B. – Journal of Educational Measurement, 1971

Four definitions of cultural fairness" are critically examined. Suggestions for dealing with conflicts between the two goals of maximizing a test's validity and minimizing its culture-group discrimination, are presented. Terms in which this judgment should be made, and methods of using its results are described. (LR)

Descriptors: Cultural Background, Cultural Differences, Culture Fair Tests, Test Bias

Toward an Improved Measure of Remote Associational Ability

Peer reviewed

Worthen, Blaine R.; Clark, Philip M. – Journal of Educational Measurement, 1971

Descriptors: Association Measures, College Students, Creativity, Creativity Tests

Previous Page | Next Page »

Pages: 1 | 2 | 3 | 4

Ebel, Robert L.	2
Wainer, Howard	2
Abeles, Harold F.	1
Ackerman, Terry A.	1
Amery D. Wu	1
Bell, John F.	1
Bennett, Randy Elliot	1
Benson, Jeri	1
Beuchert, A. Kent	1
Bhushan, Vidya	1
Biancarosa, Gina	1
Board, Cynthia	1
Brandenburg, Dale C.	1
Breland, Hunter M.	1
Calfee, Robert	1
Carlson, Sarah E.	1
Carver, Ronald P.	1
Chang, Hua-Hua	1
Clark, Philip M.	1
Clauser, Brian E.	1
Darby, Charles A., Jr.	1
Darlington, Richard B.	1
Davison, Mark L.	1
Diamond, James J.	1
More ▼