Publication Date
In 2025 | 0 |
Since 2024 | 0 |
Since 2021 (last 5 years) | 1 |
Since 2016 (last 10 years) | 3 |
Since 2006 (last 20 years) | 4 |
Descriptor
Item Analysis | 31 |
Test Validity | 26 |
Test Construction | 11 |
Test Items | 11 |
Test Reliability | 11 |
Criterion Referenced Tests | 7 |
Test Interpretation | 6 |
Scores | 5 |
Achievement Tests | 4 |
Comparative Analysis | 4 |
Error of Measurement | 4 |
More ▼ |
Source
Journal of Educational… | 31 |
Author
Ahn, Meeyeon | 1 |
Anderson, Ronald E. | 1 |
Benson, Jeri | 1 |
Betebenner, Damian | 1 |
Beuchert, A. Kent | 1 |
Board, Cynthia | 1 |
Brandenburg, Dale C. | 1 |
Chalifour, Clark L. | 1 |
Crehan, Kevin D. | 1 |
Crocker, Linda | 1 |
Emrick, John A. | 1 |
More ▼ |
Publication Type
Journal Articles | 17 |
Reports - Research | 15 |
Reports - Evaluative | 2 |
Education Level
Secondary Education | 1 |
Audience
Researchers | 1 |
Location
Laws, Policies, & Programs
Assessments and Surveys
What Works Clearinghouse Rating
Shear, Benjamin R. – Journal of Educational Measurement, 2023
Large-scale standardized tests are regularly used to measure student achievement overall and for student subgroups. These uses assume tests provide comparable measures of outcomes across student subgroups, but prior research suggests score comparisons across gender groups may be complicated by the type of test items used. This paper presents…
Descriptors: Gender Bias, Item Analysis, Test Items, Achievement Tests
Peabody, Michael R.; Wind, Stefanie A. – Journal of Educational Measurement, 2019
Setting performance standards is a judgmental process involving human opinions and values as well as technical and empirical considerations. Although all cut score decisions are by nature somewhat arbitrary, they should not be capricious. Judges selected for standard-setting panels should have the proper qualifications to make the judgments asked…
Descriptors: Standard Setting, Decision Making, Performance Based Assessment, Evaluators
Ju, Unhee; Falk, Carl F. – Journal of Educational Measurement, 2019
We examined the feasibility and results of a multilevel multidimensional nominal response model (ML-MNRM) for measuring both substantive constructs and extreme response style (ERS) across countries. The ML-MNRM considers within-country clustering while allowing overall item slopes to vary across items and examination of whether certain items were…
Descriptors: Cross Cultural Studies, Self Efficacy, Item Response Theory, Item Analysis

Millman, Jason; Popham, W. James – Journal of Educational Measurement, 1974
The use of the regression equation derived from the Anglo-American sample to predict grades of Mexican-American students resulted in overprediction. An examination of the standardized regression weights revealed a significant difference in the weight given to the Scholastic Aptitude Test Mathematics Score. (Author/BB)
Descriptors: Criterion Referenced Tests, Item Analysis, Predictive Validity, Scores

Woodson, M. I. Chas. E. – Journal of Educational Measurement, 1974
Descriptors: Criterion Referenced Tests, Item Analysis, Test Construction, Test Reliability

Crocker, Linda; And Others – Journal of Educational Measurement, 1988
Using generalizability theory as a framework, the problem of assessing the content validity of standardized achievement tests is considered. Four designs to assess test-item fit to a curriculum are described, and procedures for determining the optimal number of raters and schools in a content-validation decision-making study are considered. (TJH)
Descriptors: Achievement Tests, Content Validity, Decision Making, Elementary Education

Crehan, Kevin D. – Journal of Educational Measurement, 1974
Various item selection techniques are compared on criterion-referenced reliability and validity. Techniques compared include three nominal criterion-referenced methods, a traditional point biserial selection, teacher selection, and random selection. (Author)
Descriptors: Comparative Analysis, Criterion Referenced Tests, Item Analysis, Item Banks

Haladyna, Thomas Michael – Journal of Educational Measurement, 1974
Classical test construction and analysis procedures are applicable and appropriate for use with criterion referenced tests when samples of both mastery and nonmastery examinees are employed. (Author/BB)
Descriptors: Criterion Referenced Tests, Item Analysis, Mastery Tests, Test Construction

Woodson, M. I. Charles E. – Journal of Educational Measurement, 1974
The basis for selection of the calibration sample determines the kind of scale which will be developed. A random sample from a population of individuals leads to a norm-referenced scale, and a sample representative of abilities of a range of characteristics leads to a criterion-referenced scale. (Author/BB)
Descriptors: Criterion Referenced Tests, Discriminant Analysis, Item Analysis, Test Construction

Hartke, Alan R. – Journal of Educational Measurement, 1978
Latent partition analysis is shown to be useful in determining the conceptual homogeneity of an item population. Such item populations are useful for mastery testing. Applications of latent partition analysis in assessing content validity are suggested. (Author/JKS)
Descriptors: Higher Education, Item Analysis, Item Sampling, Mastery Tests

Schwartz, Steven A. – Journal of Educational Measurement, 1978
A method for the construction of scales which combines the rational (or intuitive) approach with an empirical (item analysis) approach is presented. A step-by-step procedure is provided. (Author/JKS)
Descriptors: Factor Analysis, Item Analysis, Measurement, Psychological Testing

Hartnett, Rodney T. – Journal of Educational Measurement, 1971
Alternative scoring methods yield essentially the same information, including scale intercorrelations and validity. Reasons for preferring the traditional psychometric scoring technique are offered. (Author/AG)
Descriptors: College Environment, Comparative Analysis, Correlation, Item Analysis

Board, Cynthia; Whitney, Douglas R. – Journal of Educational Measurement, 1972
For the principles studied here, poor item-writing practices serve to obscure (or attentuate) differences between good and poor students. (Authors)
Descriptors: College Students, Item Analysis, Multiple Choice Tests, Test Construction

Beuchert, A. Kent; Mendoza, Jorge L. – Journal of Educational Measurement, 1979
Ten item discrimination indices, across a variety of item analysis situations, were compared, based on the validities of tests constructed by using each of the indices to select 40 items from a 100-item pool. Item score data were generated by a computer program and included a simulation of guessing. (Author/CTM)
Descriptors: Item Analysis, Simulation, Statistical Analysis, Test Construction

Anderson, Ronald E.; And Others – Journal of Educational Measurement, 1982
Findings on alternative procedures for evaluating measures of achievement in individual data packages at the National Assessment of Educational Progress are presented with their methodological implications. The need for secondary analysts to be aware of the organization of the data, and positive and negative features are discussed. (Author/CM)
Descriptors: Achievement, Databases, Educational Assessment, Elementary Secondary Education