Publication Date
In 2025 | 4 |
Since 2024 | 8 |
Since 2021 (last 5 years) | 19 |
Since 2016 (last 10 years) | 35 |
Since 2006 (last 20 years) | 57 |
Descriptor
Test Validity | 165 |
Test Reliability | 68 |
Test Construction | 52 |
Validity | 52 |
Higher Education | 36 |
Test Items | 35 |
Predictive Validity | 33 |
Scores | 33 |
Item Analysis | 31 |
Test Interpretation | 30 |
Test Bias | 29 |
More ▼ |
Source
Journal of Educational… | 252 |
Author
Publication Type
Education Level
Higher Education | 6 |
Postsecondary Education | 6 |
Secondary Education | 4 |
Middle Schools | 3 |
Elementary Education | 2 |
Elementary Secondary Education | 2 |
Junior High Schools | 2 |
Grade 7 | 1 |
Grade 8 | 1 |
High Schools | 1 |
Audience
Researchers | 7 |
Practitioners | 2 |
Laws, Policies, & Programs
Assessments and Surveys
What Works Clearinghouse Rating

Haladyna, Thomas Michael – Journal of Educational Measurement, 1974
Classical test construction and analysis procedures are applicable and appropriate for use with criterion referenced tests when samples of both mastery and nonmastery examinees are employed. (Author/BB)
Descriptors: Criterion Referenced Tests, Item Analysis, Mastery Tests, Test Construction

Woodson, M. I. Charles E. – Journal of Educational Measurement, 1974
The basis for selection of the calibration sample determines the kind of scale which will be developed. A random sample from a population of individuals leads to a norm-referenced scale, and a sample representative of abilities of a range of characteristics leads to a criterion-referenced scale. (Author/BB)
Descriptors: Criterion Referenced Tests, Discriminant Analysis, Item Analysis, Test Construction

Tittle, Caroll Kehr – Journal of Educational Measurement, 1975
This review looks at these changes and their impact on the quality of the instrument: alteration of age-range from 5-15 to 6-16; development of new norms; improvement of manual in format and function; and a number of old items deleted and new ones added for the subtests. (RC)
Descriptors: Elementary Secondary Education, Guides, Intelligence Tests, Norms

Hartke, Alan R. – Journal of Educational Measurement, 1978
Latent partition analysis is shown to be useful in determining the conceptual homogeneity of an item population. Such item populations are useful for mastery testing. Applications of latent partition analysis in assessing content validity are suggested. (Author/JKS)
Descriptors: Higher Education, Item Analysis, Item Sampling, Mastery Tests

Ayrer, James E.; McNamara, Thomas C. – Journal of Educational Measurement, 1973
Out-of-level'' testing is the assigning of pupils to levels of a standardized test on the basis of previous test scores rather than their present grade assignment. Test results of 1500 children were reviewed to see if their performance supported the rationale behind the practice. (Author/CB)
Descriptors: Achievement Rating, Elementary School Students, Standardized Tests, Test Interpretation

Darlington, Richard B. – Journal of Educational Measurement, 1971
Four definitions of cultural fairness" are critically examined. Suggestions for dealing with conflicts between the two goals of maximizing a test's validity and minimizing its culture-group discrimination, are presented. Terms in which this judgment should be made, and methods of using its results are described. (LR)
Descriptors: Cultural Background, Cultural Differences, Culture Fair Tests, Test Bias

Worthen, Blaine R.; Clark, Philip M. – Journal of Educational Measurement, 1971
Descriptors: Association Measures, College Students, Creativity, Creativity Tests

Simon, Alan J.; Joiner, Lee M. – Journal of Educational Measurement, 1976
The purpose of this study was to determine whether a Mexican version of the Peabody Picture Vocabulary Test could be improved by directly translating both forms of the American test, then using decision procedures to select the better item of each pair. The reliability of the simple translations suffered. (Author/BW)
Descriptors: Early Childhood Education, Spanish, Test Construction, Test Format
DeCarlo, Lawrence T. – Journal of Educational Measurement, 2005
An approach to essay grading based on signal detection theory (SDT) is presented. SDT offers a basis for understanding rater behavior with respect to the scoring of construct responses, in that it provides a theory of psychological processes underlying the raters' behavior. The approach also provides measures of the precision of the raters and the…
Descriptors: Validity, Simulation, Grading, Item Response Theory

Airasian, Peter W.; Bart, William M. – Journal of Educational Measurement, 1975
Validation studies of learning hierarchies usually examine whether task relationships posited a priori are confirmed by student learning data. This method was compared with a non-posited task relationship where all possible task relationships were generated and investigated. A learning hierarchy in a seventh grade mathematics study reported by…
Descriptors: Difficulty Level, Intellectual Development, Junior High Schools, Learning Theories

Ebel, Robert L. – Journal of Educational Measurement, 1975
Descriptors: Comparative Analysis, Multiple Choice Tests, Objective Tests, Teachers

Schwartz, Steven A. – Journal of Educational Measurement, 1978
A method for the construction of scales which combines the rational (or intuitive) approach with an empirical (item analysis) approach is presented. A step-by-step procedure is provided. (Author/JKS)
Descriptors: Factor Analysis, Item Analysis, Measurement, Psychological Testing

Bhushan, Vidya – Journal of Educational Measurement, 1974
Descriptors: Cultural Differences, French, Intelligence Tests, Languages

Humphreys, Lloyd G.; Taber, Thomas – Journal of Educational Measurement, 1973
Data from a postdictive study of the tests of the Graduate Record Examination and the eight semesters of undergraduate grade averages, each semester's average being computed independently of the rest, are presented. (Editor)
Descriptors: Aptitude Tests, Class Average, Correlation, Grade Point Average

Pyrczak, Fred – Journal of Educational Measurement, 1973
Despite the numerous individual illustrations in the literature showing how the discrimination index may be used to identify items with faults, its overall effectiveness as a measure of item quality, defined in terms of the presence or absence of faults, is not clear. This study investigates its validity. (Author/RK)
Descriptors: Correlation, Discriminant Analysis, Item Banks, Rating Scales