NotesFAQContact Us
Collection
Advanced
Search Tips
Showing all 15 results Save | Export
Peer reviewed Peer reviewed
Direct linkDirect link
Augustin Mutak; Robert Krause; Esther Ulitzsch; Sören Much; Jochen Ranger; Steffi Pohl – Journal of Educational Measurement, 2024
Understanding the intraindividual relation between an individual's speed and ability in testing scenarios is essential to assure a fair assessment. Different approaches exist for estimating this relationship, that either rely on specific study designs or on specific assumptions. This paper aims to add to the toolbox of approaches for estimating…
Descriptors: Testing, Academic Ability, Time on Task, Correlation
Peer reviewed Peer reviewed
Direct linkDirect link
Shun-Fu Hu; Amery D. Wu; Jake Stone – Journal of Educational Measurement, 2025
Scoring high-dimensional assessments (e.g., > 15 traits) can be a challenging task. This paper introduces the multilabel neural network (MNN) as a scoring method for high-dimensional assessments. Additionally, it demonstrates how MNN can score the same test responses to maximize different performance metrics, such as accuracy, recall, or…
Descriptors: Tests, Testing, Scores, Test Construction
Peer reviewed Peer reviewed
Direct linkDirect link
Wang, Wenyi; Song, Lihong; Chen, Ping; Meng, Yaru; Ding, Shuliang – Journal of Educational Measurement, 2015
Classification consistency and accuracy are viewed as important indicators for evaluating the reliability and validity of classification results in cognitive diagnostic assessment (CDA). Pattern-level classification consistency and accuracy indices were introduced by Cui, Gierl, and Chang. However, the indices at the attribute level have not yet…
Descriptors: Classification, Reliability, Accuracy, Cognitive Tests
Peer reviewed Peer reviewed
Direct linkDirect link
Jin, Kuan-Yu; Wang, Wen-Chung – Journal of Educational Measurement, 2014
Sometimes, test-takers may not be able to attempt all items to the best of their ability (with full effort) due to personal factors (e.g., low motivation) or testing conditions (e.g., time limit), resulting in poor performances on certain items, especially those located toward the end of a test. Standard item response theory (IRT) models fail to…
Descriptors: Student Evaluation, Item Response Theory, Models, Simulation
Peer reviewed Peer reviewed
Subkoviak, Michael J. – Journal of Educational Measurement, 1988
Current methods for obtaining reliability indices for mastery tests can be laborious. This paper offers practitioners tables from which agreement and kappa coefficients can be read directly and provides criterion for acceptable values of agreement and kappa coefficients. (TJH)
Descriptors: Mastery Tests, Statistical Analysis, Test Reliability, Testing
Peer reviewed Peer reviewed
Thorndike, Robert L. – Journal of Educational Measurement, 1971
Descriptors: Culture Fair Tests, Predictive Measurement, Test Bias, Test Reliability
Peer reviewed Peer reviewed
Grier, J. Brown – Journal of Educational Measurement, 1975
The expected reliability of a multiple choice test is maximized by the use of three alternative items. (Author)
Descriptors: Achievement Tests, Multiple Choice Tests, Test Construction, Test Reliability
Peer reviewed Peer reviewed
Lennon, Roger T. – Journal of Educational Measurement, 1975
Reviews the 1974 Standards, an updating serving as a guide to test making and publishing, and training of persons for these endeavors. (DEP)
Descriptors: Educational Testing, Psychological Testing, Scoring, Standards
Peer reviewed Peer reviewed
Remer, Rory – Journal of Educational Measurement, 1978
The relative efficiency and cost-effectiveness of three methods of producing and administering a worksample simulation test of interpersonal communication competence employing a multiple choice response format is explored. (Author/JKS)
Descriptors: Communication Skills, Cost Effectiveness, Higher Education, Interpersonal Competence
Peer reviewed Peer reviewed
Diamond, James J. – Journal of Educational Measurement, 1975
Investigates the reliability and validity of scores yielded from a new scoring formula. (Author/DEP)
Descriptors: Guessing (Tests), Multiple Choice Tests, Objective Tests, Scoring
Peer reviewed Peer reviewed
Huck, Schuyler W. – Journal of Educational Measurement, 1978
Providing examinees with advanced knowledge of the difficulty of an item led to an increase in test performance with no loss of reliability. This finding was consistent across several test formats. ( Author/JKS)
Descriptors: Difficulty Level, Feedback, Higher Education, Item Analysis
Peer reviewed Peer reviewed
Angoff, William H.; Schrader, William B. – Journal of Educational Measurement, 1984
The reported data provide a basis for evaluating the formula-scoring versus rights-scoring issue and for assessing the effects of directions on the reliability and parallelism of scores for sophisticated examinees taking professionally developed tests. Results support the invariance hypothesis rather than the differential effects hypothesis.…
Descriptors: College Entrance Examinations, Guessing (Tests), Higher Education, Hypothesis Testing
Peer reviewed Peer reviewed
Carlson, Jerry S.; Dillon, Ronna – Journal of Educational Measurement, 1979
The Matrices and Order of Appearance subtests of a Piagetian test battery were administered to a sample of second-grade children on two occasions under two test conditions: standardized testing and a dialogue between child and examiner. Differences for test condition and time of testing were found. (JKS)
Descriptors: Academic Achievement, Developmental Psychology, Developmental Stages, Individual Testing
Peer reviewed Peer reviewed
Kansup, Wanlop; Hakstian, A. Ralph – Journal of Educational Measurement, 1975
Effects of logically weighting incorrect item options in conventional tests and different scoring functions with confidence tests on reliability and validity were examined. Ninth graders took conventionally administered Verbal and Mathematical Reasoning tests, scored conventionally and by a procedure assigning degree-of-correctness weights to…
Descriptors: Comparative Analysis, Confidence Testing, Junior High School Students, Multiple Choice Tests
Peer reviewed Peer reviewed
Hakstian, A. Ralph; Kansup, Wanlop – Journal of Educational Measurement, 1975
A comparison of reliability and validity was made for three testing procedures: 1) responding conventionally to Verbal Ability and Mathematical Reasoning tests; 2) using a confidence weighting response procedure with the same tests; and 3) using the elimination response method. The experimental testing procedures were not psychometrically superior…
Descriptors: Comparative Analysis, Confidence Testing, Guessing (Tests), Junior High School Students