NotesFAQContact Us
Collection
Advanced
Search Tips
Source
Journal of Educational…52
Audience
Researchers3
Laws, Policies, & Programs
What Works Clearinghouse Rating
Showing 1 to 15 of 52 results Save | Export
Peer reviewed Peer reviewed
Direct linkDirect link
Shun-Fu Hu; Amery D. Wu; Jake Stone – Journal of Educational Measurement, 2025
Scoring high-dimensional assessments (e.g., > 15 traits) can be a challenging task. This paper introduces the multilabel neural network (MNN) as a scoring method for high-dimensional assessments. Additionally, it demonstrates how MNN can score the same test responses to maximize different performance metrics, such as accuracy, recall, or…
Descriptors: Tests, Testing, Scores, Test Construction
Peer reviewed Peer reviewed
Direct linkDirect link
Yaneva, Victoria; Clauser, Brian E.; Morales, Amy; Paniagua, Miguel – Journal of Educational Measurement, 2021
Eye-tracking technology can create a record of the location and duration of visual fixations as a test-taker reads test questions. Although the cognitive process the test-taker is using cannot be directly observed, eye-tracking data can support inferences about these unobserved cognitive processes. This type of information has the potential to…
Descriptors: Eye Movements, Test Validity, Multiple Choice Tests, Cognitive Processes
Peer reviewed Peer reviewed
Direct linkDirect link
Liu, Bowen; Kennedy, Patrick C.; Seipel, Ben; Carlson, Sarah E.; Biancarosa, Gina; Davison, Mark L. – Journal of Educational Measurement, 2019
This article describes an ongoing project to develop a formative, inferential reading comprehension assessment of causal story comprehension. It has three features to enhance classroom use: equated scale scores for progress monitoring within and across grades, a scale score to distinguish among low-scoring students based on patterns of mistakes,…
Descriptors: Formative Evaluation, Reading Comprehension, Story Reading, Test Construction
Peer reviewed Peer reviewed
Direct linkDirect link
Mislevy, Robert J. – Journal of Educational Measurement, 2016
Validity is the sine qua non of properties of educational assessment. While a theory of validity and a practical framework for validation has emerged over the past decades, most of the discussion has addressed familiar forms of assessment and psychological framings. Advances in digital technologies and in cognitive and social psychology have…
Descriptors: Test Validity, Technology, Cognitive Psychology, Social Psychology
Peer reviewed Peer reviewed
Direct linkDirect link
Wang, Shiyu; Lin, Haiyan; Chang, Hua-Hua; Douglas, Jeff – Journal of Educational Measurement, 2016
Computerized adaptive testing (CAT) and multistage testing (MST) have become two of the most popular modes in large-scale computer-based sequential testing. Though most designs of CAT and MST exhibit strength and weakness in recent large-scale implementations, there is no simple answer to the question of which design is better because different…
Descriptors: Computer Assisted Testing, Adaptive Testing, Test Format, Sequential Approach
Peer reviewed Peer reviewed
Millman, Jason; Popham, W. James – Journal of Educational Measurement, 1974
The use of the regression equation derived from the Anglo-American sample to predict grades of Mexican-American students resulted in overprediction. An examination of the standardized regression weights revealed a significant difference in the weight given to the Scholastic Aptitude Test Mathematics Score. (Author/BB)
Descriptors: Criterion Referenced Tests, Item Analysis, Predictive Validity, Scores
Peer reviewed Peer reviewed
Woodson, M. I. Chas. E. – Journal of Educational Measurement, 1974
Descriptors: Criterion Referenced Tests, Item Analysis, Test Construction, Test Reliability
Peer reviewed Peer reviewed
Ackerman, Terry A. – Journal of Educational Measurement, 1992
The difference between item bias and item impact and the way they relate to item validity are discussed from a multidimensional item response theory perspective. The Mantel-Haenszel procedure and the Simultaneous Item Bias strategy are used in a Monte Carlo study to illustrate detection of item bias. (SLD)
Descriptors: Causal Models, Computer Simulation, Construct Validity, Equations (Mathematics)
Peer reviewed Peer reviewed
Washington, William N.; Godfrey, R. Richard – Journal of Educational Measurement, 1974
Item statistics between illustrated and written items drawn from the same content areas were compared using F ratios. The results indicated: that illustrated items performed slightly better than matched written items; and that the best performing category of illustrated items was tables. (Author/BB)
Descriptors: Achievement Tests, Illustrations, Test Construction, Test Items
Peer reviewed Peer reviewed
Grier, J. Brown – Journal of Educational Measurement, 1975
The expected reliability of a multiple choice test is maximized by the use of three alternative items. (Author)
Descriptors: Achievement Tests, Multiple Choice Tests, Test Construction, Test Reliability
Peer reviewed Peer reviewed
Embretson, Susan; Gorin, Joanna – Journal of Educational Measurement, 2001
Examines testing practices in: (1) the past, in which the traditional paradigm left little room for cognitive psychology principles; (2) the present, in which testing research is enhanced by principles of cognitive psychology; and (3) the future, in which the potential of cognitive psychology should be fully realized through item design.…
Descriptors: Cognitive Psychology, Construct Validity, Educational Research, Educational Testing
Peer reviewed Peer reviewed
Haladyna, Thomas Michael – Journal of Educational Measurement, 1974
Classical test construction and analysis procedures are applicable and appropriate for use with criterion referenced tests when samples of both mastery and nonmastery examinees are employed. (Author/BB)
Descriptors: Criterion Referenced Tests, Item Analysis, Mastery Tests, Test Construction
Peer reviewed Peer reviewed
Woodson, M. I. Charles E. – Journal of Educational Measurement, 1974
The basis for selection of the calibration sample determines the kind of scale which will be developed. A random sample from a population of individuals leads to a norm-referenced scale, and a sample representative of abilities of a range of characteristics leads to a criterion-referenced scale. (Author/BB)
Descriptors: Criterion Referenced Tests, Discriminant Analysis, Item Analysis, Test Construction
Peer reviewed Peer reviewed
Darlington, Richard B. – Journal of Educational Measurement, 1971
Four definitions of cultural fairness" are critically examined. Suggestions for dealing with conflicts between the two goals of maximizing a test's validity and minimizing its culture-group discrimination, are presented. Terms in which this judgment should be made, and methods of using its results are described. (LR)
Descriptors: Cultural Background, Cultural Differences, Culture Fair Tests, Test Bias
Peer reviewed Peer reviewed
Worthen, Blaine R.; Clark, Philip M. – Journal of Educational Measurement, 1971
Descriptors: Association Measures, College Students, Creativity, Creativity Tests
Previous Page | Next Page ยป
Pages: 1  |  2  |  3  |  4