Publication Date
In 2025 | 1 |
Since 2024 | 1 |
Since 2021 (last 5 years) | 1 |
Since 2016 (last 10 years) | 7 |
Since 2006 (last 20 years) | 11 |
Descriptor
Source
Journal of Educational… | 57 |
Author
Goldman, Roy D. | 3 |
Hambleton, Ronald K. | 2 |
Kolen, Michael J. | 2 |
Linn, Robert L. | 2 |
Abeles, Harold F. | 1 |
Ali, Usama S. | 1 |
Ambrosino, Robert J. | 1 |
Amery D. Wu | 1 |
Ankenman, Robert D. | 1 |
Baldwin, Su G. | 1 |
Biggs, J. B. | 1 |
More ▼ |
Publication Type
Journal Articles | 40 |
Reports - Research | 23 |
Reports - Evaluative | 4 |
Book/Product Reviews | 2 |
Information Analyses | 2 |
Reports - Descriptive | 2 |
Opinion Papers | 1 |
Education Level
Higher Education | 2 |
Postsecondary Education | 2 |
Audience
Researchers | 1 |
Location
Georgia | 1 |
Laws, Policies, & Programs
Defunis v Odegaard | 1 |
Assessments and Surveys
What Works Clearinghouse Rating
Shun-Fu Hu; Amery D. Wu; Jake Stone – Journal of Educational Measurement, 2025
Scoring high-dimensional assessments (e.g., > 15 traits) can be a challenging task. This paper introduces the multilabel neural network (MNN) as a scoring method for high-dimensional assessments. Additionally, it demonstrates how MNN can score the same test responses to maximize different performance metrics, such as accuracy, recall, or…
Descriptors: Tests, Testing, Scores, Test Construction
Liu, Chunyan; Kolen, Michael J. – Journal of Educational Measurement, 2020
Smoothing is designed to yield smoother equating results that can reduce random equating error without introducing very much systematic error. The main objective of this study is to propose a new statistic and to compare its performance to the performance of the Akaike information criterion and likelihood ratio chi-square difference statistics in…
Descriptors: Equated Scores, Statistical Analysis, Error of Measurement, Criteria
Lane, Suzanne – Journal of Educational Measurement, 2019
Rater-mediated assessments require the evaluation of the accuracy and consistency of the inferences made by the raters to ensure the validity of score interpretations and uses. Modeling rater response processes allows for a better understanding of how raters map their representations of the examinee performance to their representation of the…
Descriptors: Responses, Accuracy, Validity, Interrater Reliability
Chen, Yi-Hsin; Senk, Sharon L.; Thompson, Denisse R.; Voogt, Kevin – Journal of Educational Measurement, 2019
The van Hiele theory and van Hiele Geometry Test have been extensively used in mathematics assessments across countries. The purpose of this study is to use classical test theory (CTT) and cognitive diagnostic modeling (CDM) frameworks to examine psychometric properties of the van Hiele Geometry Test and to compare how various classification…
Descriptors: Geometry, Mathematics Tests, Test Theory, Psychometrics
Debeer, Dries; Ali, Usama S.; van Rijn, Peter W. – Journal of Educational Measurement, 2017
Test assembly is the process of selecting items from an item pool to form one or more new test forms. Often new test forms are constructed to be parallel with an existing (or an ideal) test. Within the context of item response theory, the test information function (TIF) or the test characteristic curve (TCC) are commonly used as statistical…
Descriptors: Test Format, Test Construction, Statistical Analysis, Comparative Analysis
Kang, Hyeon-Ah; Zhang, Susu; Chang, Hua-Hua – Journal of Educational Measurement, 2017
The development of cognitive diagnostic-computerized adaptive testing (CD-CAT) has provided a new perspective for gaining information about examinees' mastery on a set of cognitive attributes. This study proposes a new item selection method within the framework of dual-objective CD-CAT that simultaneously addresses examinees' attribute mastery…
Descriptors: Computer Assisted Testing, Adaptive Testing, Cognitive Tests, Test Items
Lee, Woo-yeol; Cho, Sun-Joo – Journal of Educational Measurement, 2017
Cross-level invariance in a multilevel item response model can be investigated by testing whether the within-level item discriminations are equal to the between-level item discriminations. Testing the cross-level invariance assumption is important to understand constructs in multilevel data. However, in most multilevel item response model…
Descriptors: Test Items, Item Response Theory, Item Analysis, Simulation
van der Linden, Wim J. – Journal of Educational Measurement, 2013
In spite of all of the technical progress in observed-score equating, several of the more conceptual aspects of the process still are not well understood. As a result, the equating literature struggles with rather complex criteria of equating, lack of a test-theoretic foundation, confusing terminology, and ad hoc analyses. A return to Lord's…
Descriptors: Equated Scores, Statistical Analysis, Computation, Data Collection
Han, Kyung T. – Journal of Educational Measurement, 2012
Successful administration of computerized adaptive testing (CAT) programs in educational settings requires that test security and item exposure control issues be taken seriously. Developing an item selection algorithm that strikes the right balance between test precision and level of item pool utilization is the key to successful implementation…
Descriptors: Computer Assisted Testing, Adaptive Testing, Test Items, Selection
Clauser, Brian E.; Mee, Janet; Baldwin, Su G.; Margolis, Melissa J.; Dillon, Gerard F. – Journal of Educational Measurement, 2009
Although the Angoff procedure is among the most widely used standard setting procedures for tests comprising multiple-choice items, research has shown that subject matter experts have considerable difficulty accurately making the required judgments in the absence of examinee performance data. Some authors have viewed the need to provide…
Descriptors: Standard Setting (Scoring), Program Effectiveness, Expertise, Health Personnel
Culpepper, Steven A.; Davenport, Ernest C. – Journal of Educational Measurement, 2009
Previous research notes the importance of understanding racial/ethnic differential prediction of college grades across multiple institutions. Institutional variation in selection indices is especially important given some states' laws governing public institutions' admissions decisions. This paper employed multilevel moderated multiple regression…
Descriptors: Prediction, College Students, Grades (Scholastic), Race

Strenta, A. Christopher; Elliott, Rogers – Journal of Educational Measurement, 1987
Differential grading standards were examined in a sample of 1,029 Dartmouth College graduates. Fields of study that attracted students (as majors) who scored higher on the Scholastic Aptitude Test (SAT) employed stricter grading standards. These differential standards attenuated the substantial correlation between SAT scores and grades.…
Descriptors: Academic Standards, Admission Criteria, College Entrance Examinations, Competitive Selection

Shani, Esther; Petrosko, Joseph M. – Journal of Educational Measurement, 1976
Data from the Center for the Study of Evaluation's Secondary School Test Evaluations were analyzed to explore the present adequacy and propose a future direction of formal procedures for evaluating standardized tests. (Author/RC)
Descriptors: Evaluation, Evaluation Criteria, Secondary Education, Standardized Tests

Young, John W. – Journal of Educational Measurement, 1990
A new measure of academic performance was developed through a new application of item response theory (IRT). This new criterion, an IRT-based grade point average (GPA), was used to determine the predictive validity of certain preadmissions measures for 1,564 students admitted to Stanford University in 1982. (SLD)
Descriptors: Academic Achievement, Admission Criteria, College Entrance Examinations, College Students

Sawyer, Richard L.; And Others – Journal of Educational Measurement, 1976
This article examines some of the values that might be considered in a selection situation within the context of a decision theoretic model also described here. Several alternate expressions of fair selection are suggested in the form of utility statements in which these values can be understood and compared. (Author/DEP)
Descriptors: Bias, Decision Making, Evaluation Criteria, Models