Publication Date
In 2025 | 4 |
Since 2024 | 8 |
Since 2021 (last 5 years) | 19 |
Since 2016 (last 10 years) | 35 |
Since 2006 (last 20 years) | 57 |
Descriptor
Test Validity | 165 |
Test Reliability | 68 |
Test Construction | 52 |
Validity | 52 |
Higher Education | 36 |
Test Items | 35 |
Predictive Validity | 33 |
Scores | 33 |
Item Analysis | 31 |
Test Interpretation | 30 |
Test Bias | 29 |
More ▼ |
Source
Journal of Educational… | 252 |
Author
Publication Type
Education Level
Higher Education | 6 |
Postsecondary Education | 6 |
Secondary Education | 4 |
Middle Schools | 3 |
Elementary Education | 2 |
Elementary Secondary Education | 2 |
Junior High Schools | 2 |
Grade 7 | 1 |
Grade 8 | 1 |
High Schools | 1 |
Audience
Researchers | 7 |
Practitioners | 2 |
Laws, Policies, & Programs
Assessments and Surveys
What Works Clearinghouse Rating

Stufflebeam, Daniel L. – Journal of Educational Measurement, 1971
Descriptors: Data Analysis, Educational Experiments, Evaluation Methods, Individual Differences

Hartnett, Rodney T. – Journal of Educational Measurement, 1971
Alternative scoring methods yield essentially the same information, including scale intercorrelations and validity. Reasons for preferring the traditional psychometric scoring technique are offered. (Author/AG)
Descriptors: College Environment, Comparative Analysis, Correlation, Item Analysis

Linn, Robert L. – Journal of Educational Measurement, 1983
When the precise basis of selection effect on correlation and regression equations is unknown but can be modeled by selection on a variable that is highly but not perfectly related to observed scores, the selection effects can lead to the commonly observed "overprediction" results in studies of predictive bias. (Author/PN)
Descriptors: Bias, Correlation, Higher Education, Prediction

Bridgeman, Brent; Morgan, Rick; Wang, Ming-mei – Journal of Educational Measurement, 1997
Test results of 915 high school students taking a history examination with a choice of topics show that students were generally able to pick the topic on which they could get the highest score. Implications for fair scoring when topic choice is allowed are discussed. (SLD)
Descriptors: Essay Tests, High School Students, History, Performance Factors

Williamson, David M.; Bejar, Isaac I.; Hone, Anne S. – Journal of Educational Measurement, 1999
Contrasts "mental models" used by automated scoring for the simulation division of the computerized Architect Registration Examination with those used by experienced human graders for 3,613 candidate solutions. Discusses differences in the models used and the potential of automated scoring to enhance the validity evidence of scores. (SLD)
Descriptors: Architects, Comparative Analysis, Computer Assisted Testing, Judges
Wise, Steven L.; DeMars, Christine E. – Journal of Educational Measurement, 2006
The validity of inferences based on achievement test scores is dependent on the amount of effort that examinees put forth while taking the test. With low-stakes tests, for which this problem is particularly prevalent, there is a consequent need for psychometric models that can take into account differing levels of examinee effort. This article…
Descriptors: Guessing (Tests), Psychometrics, Inferences, Reaction Time

Hanna, Gerald S. – Journal of Educational Measurement, 1974
Descriptors: Aptitude Tests, Secondary School Students, Test Interpretation, Test Reliability

Hanna, Gerald S. – Journal of Educational Measurement, 1977
The effects of providing total and partial immediate feedback to pupils in multiple choice testing was investigated with fifth and sixth grade pupils. The split-half reliability was higher with total feedback than with no feedback. Concurrent validity with a completion test showed all three settings to be nearly identical. (Author/JKS)
Descriptors: Elementary Education, Elementary School Students, Feedback, Forced Choice Technique

Linn, Robert L. – Journal of Educational Measurement, 1984
The common approach to studies of predictive bias is analyzed within the context of a conceptual model in which predictors and criterion measures are viewed as fallible indicators of idealized qualifications. (Author/PN)
Descriptors: Certification, Models, Predictive Measurement, Predictive Validity

Brandenburg, Dale C.; Whitney, Douglas R. – Journal of Educational Measurement, 1972
Primary purpose of this study was to investigate the effect of various scoring methods on the reliability and validity of the Primary Test of Economic Understanding (PTEU). the PTEU was designed to be scored using the matched pair procedure. (Authors)
Descriptors: Grade 3, Objective Tests, Response Style (Tests), Scoring Formulas

Board, Cynthia; Whitney, Douglas R. – Journal of Educational Measurement, 1972
For the principles studied here, poor item-writing practices serve to obscure (or attentuate) differences between good and poor students. (Authors)
Descriptors: College Students, Item Analysis, Multiple Choice Tests, Test Construction

Carver, Ronald P.; Darby, Charles A., Jr. – Journal of Educational Measurement, 1971
Discusses a reading test using chunked" items -- groups of meaningfully related words in which certain groups are changed in meaning from the original passage. (Author)
Descriptors: Information Storage, Multiple Choice Tests, Reading Comprehension, Reading Tests

Beuchert, A. Kent; Mendoza, Jorge L. – Journal of Educational Measurement, 1979
Ten item discrimination indices, across a variety of item analysis situations, were compared, based on the validities of tests constructed by using each of the indices to select 40 items from a 100-item pool. Item score data were generated by a computer program and included a simulation of guessing. (Author/CTM)
Descriptors: Item Analysis, Simulation, Statistical Analysis, Test Construction

Howard, George S.; And Others – Journal of Educational Measurement, 1979
Evaluations of experimental interventions which employ self-report measures are subject to contamination known as response-shift bias. Response-shift effects may be attenuated by substituting retrospective pretest ratings for the traditional self-report pretest ratings. This study indicated that the retrospective rating more accurately reflected…
Descriptors: Higher Education, Rating Scales, Response Style (Tests), Self Evaluation

Lomax, Richard G.; Algina, James – Journal of Educational Measurement, 1979
Results of using multimethod factor analysis and exploratory factor analysis for the analysis of three multitrait-multimethod matrices are compared. Results suggest that the two methods can give quite different impressions of discriminant validity. In the examples considered, the former procedure tends to support discrimination while the latter…
Descriptors: Comparative Analysis, Factor Analysis, Goodness of Fit, Matrices