Publication Date
In 2025 | 1 |
Since 2024 | 2 |
Since 2021 (last 5 years) | 2 |
Since 2016 (last 10 years) | 2 |
Since 2006 (last 20 years) | 4 |
Descriptor
Test Reliability | 15 |
Testing | 15 |
Test Validity | 6 |
Multiple Choice Tests | 5 |
Response Style (Tests) | 4 |
Scoring Formulas | 4 |
Test Construction | 4 |
Comparative Analysis | 3 |
Guessing (Tests) | 3 |
Higher Education | 3 |
Test Items | 3 |
More ▼ |
Source
Journal of Educational… | 15 |
Author
Hakstian, A. Ralph | 2 |
Kansup, Wanlop | 2 |
Amery D. Wu | 1 |
Angoff, William H. | 1 |
Augustin Mutak | 1 |
Carlson, Jerry S. | 1 |
Chen, Ping | 1 |
Diamond, James J. | 1 |
Dillon, Ronna | 1 |
Ding, Shuliang | 1 |
Esther Ulitzsch | 1 |
More ▼ |
Publication Type
Journal Articles | 7 |
Reports - Research | 6 |
Guides - Non-Classroom | 1 |
Tests/Questionnaires | 1 |
Education Level
Higher Education | 1 |
Postsecondary Education | 1 |
Secondary Education | 1 |
Audience
Practitioners | 1 |
Location
Laws, Policies, & Programs
Assessments and Surveys
Program for International… | 1 |
What Works Clearinghouse Rating
Augustin Mutak; Robert Krause; Esther Ulitzsch; Sören Much; Jochen Ranger; Steffi Pohl – Journal of Educational Measurement, 2024
Understanding the intraindividual relation between an individual's speed and ability in testing scenarios is essential to assure a fair assessment. Different approaches exist for estimating this relationship, that either rely on specific study designs or on specific assumptions. This paper aims to add to the toolbox of approaches for estimating…
Descriptors: Testing, Academic Ability, Time on Task, Correlation
Shun-Fu Hu; Amery D. Wu; Jake Stone – Journal of Educational Measurement, 2025
Scoring high-dimensional assessments (e.g., > 15 traits) can be a challenging task. This paper introduces the multilabel neural network (MNN) as a scoring method for high-dimensional assessments. Additionally, it demonstrates how MNN can score the same test responses to maximize different performance metrics, such as accuracy, recall, or…
Descriptors: Tests, Testing, Scores, Test Construction
Wang, Wenyi; Song, Lihong; Chen, Ping; Meng, Yaru; Ding, Shuliang – Journal of Educational Measurement, 2015
Classification consistency and accuracy are viewed as important indicators for evaluating the reliability and validity of classification results in cognitive diagnostic assessment (CDA). Pattern-level classification consistency and accuracy indices were introduced by Cui, Gierl, and Chang. However, the indices at the attribute level have not yet…
Descriptors: Classification, Reliability, Accuracy, Cognitive Tests
Jin, Kuan-Yu; Wang, Wen-Chung – Journal of Educational Measurement, 2014
Sometimes, test-takers may not be able to attempt all items to the best of their ability (with full effort) due to personal factors (e.g., low motivation) or testing conditions (e.g., time limit), resulting in poor performances on certain items, especially those located toward the end of a test. Standard item response theory (IRT) models fail to…
Descriptors: Student Evaluation, Item Response Theory, Models, Simulation

Subkoviak, Michael J. – Journal of Educational Measurement, 1988
Current methods for obtaining reliability indices for mastery tests can be laborious. This paper offers practitioners tables from which agreement and kappa coefficients can be read directly and provides criterion for acceptable values of agreement and kappa coefficients. (TJH)
Descriptors: Mastery Tests, Statistical Analysis, Test Reliability, Testing

Thorndike, Robert L. – Journal of Educational Measurement, 1971
Descriptors: Culture Fair Tests, Predictive Measurement, Test Bias, Test Reliability

Grier, J. Brown – Journal of Educational Measurement, 1975
The expected reliability of a multiple choice test is maximized by the use of three alternative items. (Author)
Descriptors: Achievement Tests, Multiple Choice Tests, Test Construction, Test Reliability

Lennon, Roger T. – Journal of Educational Measurement, 1975
Reviews the 1974 Standards, an updating serving as a guide to test making and publishing, and training of persons for these endeavors. (DEP)
Descriptors: Educational Testing, Psychological Testing, Scoring, Standards

Remer, Rory – Journal of Educational Measurement, 1978
The relative efficiency and cost-effectiveness of three methods of producing and administering a worksample simulation test of interpersonal communication competence employing a multiple choice response format is explored. (Author/JKS)
Descriptors: Communication Skills, Cost Effectiveness, Higher Education, Interpersonal Competence

Diamond, James J. – Journal of Educational Measurement, 1975
Investigates the reliability and validity of scores yielded from a new scoring formula. (Author/DEP)
Descriptors: Guessing (Tests), Multiple Choice Tests, Objective Tests, Scoring

Huck, Schuyler W. – Journal of Educational Measurement, 1978
Providing examinees with advanced knowledge of the difficulty of an item led to an increase in test performance with no loss of reliability. This finding was consistent across several test formats. ( Author/JKS)
Descriptors: Difficulty Level, Feedback, Higher Education, Item Analysis

Angoff, William H.; Schrader, William B. – Journal of Educational Measurement, 1984
The reported data provide a basis for evaluating the formula-scoring versus rights-scoring issue and for assessing the effects of directions on the reliability and parallelism of scores for sophisticated examinees taking professionally developed tests. Results support the invariance hypothesis rather than the differential effects hypothesis.…
Descriptors: College Entrance Examinations, Guessing (Tests), Higher Education, Hypothesis Testing

Carlson, Jerry S.; Dillon, Ronna – Journal of Educational Measurement, 1979
The Matrices and Order of Appearance subtests of a Piagetian test battery were administered to a sample of second-grade children on two occasions under two test conditions: standardized testing and a dialogue between child and examiner. Differences for test condition and time of testing were found. (JKS)
Descriptors: Academic Achievement, Developmental Psychology, Developmental Stages, Individual Testing

Kansup, Wanlop; Hakstian, A. Ralph – Journal of Educational Measurement, 1975
Effects of logically weighting incorrect item options in conventional tests and different scoring functions with confidence tests on reliability and validity were examined. Ninth graders took conventionally administered Verbal and Mathematical Reasoning tests, scored conventionally and by a procedure assigning degree-of-correctness weights to…
Descriptors: Comparative Analysis, Confidence Testing, Junior High School Students, Multiple Choice Tests

Hakstian, A. Ralph; Kansup, Wanlop – Journal of Educational Measurement, 1975
A comparison of reliability and validity was made for three testing procedures: 1) responding conventionally to Verbal Ability and Mathematical Reasoning tests; 2) using a confidence weighting response procedure with the same tests; and 3) using the elimination response method. The experimental testing procedures were not psychometrically superior…
Descriptors: Comparative Analysis, Confidence Testing, Guessing (Tests), Junior High School Students