ERIC - Search Results

Publication Date

In 2025	0
Since 2024	0
Since 2021 (last 5 years)	1
Since 2016 (last 10 years)	3
Since 2006 (last 20 years)	4

Descriptor

Item Analysis	31
Test Validity	26
Test Construction	11
Test Items	11
Test Reliability	11
Criterion Referenced Tests	7
Test Interpretation	6
Scores	5
Achievement Tests	4
Comparative Analysis	4
Error of Measurement	4
Higher Education	4
Rating Scales	4
Test Bias	4
Testing Problems	4
Difficulty Level	3
Evaluation Methods	3
Mastery Tests	3
Mathematical Models	3
Measurement	3
Predictive Validity	3
Sample Size	3
Simulation	3
Attitude Measures	2
Construct Validity	2
More ▼

Source

Journal of Educational…

Publication Type

Journal Articles	17
Reports - Research	15
Reports - Evaluative	2

Education Level

Secondary Education

Audience

Researchers

Location

Laws, Policies, & Programs

Assessments and Surveys

College and University…	1
Graduate Record Examinations	1
Iowa Tests of Basic Skills	1
National Assessment of…	1
National Teacher Examinations	1
Program for International…	1
Stanford Achievement Tests	1
Teaching and Learning…	1

What Works Clearinghouse Rating

Showing 1 to 15 of 31 results Save | Export

Gender Bias in Test Item Formats: Evidence from PISA 2009, 2012, and 2015 Math and Reading Tests

Peer reviewed

Direct link

Shear, Benjamin R. – Journal of Educational Measurement, 2023

Large-scale standardized tests are regularly used to measure student achievement overall and for student subgroups. These uses assume tests provide comparable measures of outcomes across student subgroups, but prior research suggests score comparisons across gender groups may be complicated by the type of test items used. This paper presents…

Descriptors: Gender Bias, Item Analysis, Test Items, Achievement Tests

Exploring the Influence of Judge Proficiency on Standard-Setting Judgments

Peer reviewed

Direct link

Peabody, Michael R.; Wind, Stefanie A. – Journal of Educational Measurement, 2019

Setting performance standards is a judgmental process involving human opinions and values as well as technical and empirical considerations. Although all cut score decisions are by nature somewhat arbitrary, they should not be capricious. Judges selected for standard-setting panels should have the proper qualifications to make the judgments asked…

Descriptors: Standard Setting, Decision Making, Performance Based Assessment, Evaluators

Modeling Response Styles in Cross-Country Self-Reports: An Application of a Multilevel Multidimensional Nominal Response Model

Peer reviewed

Direct link

Ju, Unhee; Falk, Carl F. – Journal of Educational Measurement, 2019

We examined the feasibility and results of a multilevel multidimensional nominal response model (ML-MNRM) for measuring both substantive constructs and extreme response style (ERS) across countries. The ML-MNRM considers within-country clustering while allowing overall item slopes to vary across items and examination of whether certain items were…

Descriptors: Cross Cultural Studies, Self Efficacy, Item Response Theory, Item Analysis

The Issue of Item and Test Variance for Criterion-Referenced Tests: A Clarification

Peer reviewed

Millman, Jason; Popham, W. James – Journal of Educational Measurement, 1974

The use of the regression equation derived from the Anglo-American sample to predict grades of Mexican-American students resulted in overprediction. An examination of the standardized regression weights revealed a significant difference in the weight given to the Scholastic Aptitude Test Mathematics Score. (Author/BB)

Descriptors: Criterion Referenced Tests, Item Analysis, Predictive Validity, Scores

The Issue of Item and Test Variance for Criterion-Referenced Tests

Peer reviewed

Woodson, M. I. Chas. E. – Journal of Educational Measurement, 1974

Descriptors: Criterion Referenced Tests, Item Analysis, Test Construction, Test Reliability

The Generalizability of Content Validity Ratings.

Peer reviewed

Crocker, Linda; And Others – Journal of Educational Measurement, 1988

Using generalizability theory as a framework, the problem of assessing the content validity of standardized achievement tests is considered. Four designs to assess test-item fit to a curriculum are described, and procedures for determining the optimal number of raters and schools in a content-validation decision-making study are considered. (TJH)

Descriptors: Achievement Tests, Content Validity, Decision Making, Elementary Education

Item Analysis for Teacher-Made Mastery Tests

Peer reviewed

Crehan, Kevin D. – Journal of Educational Measurement, 1974

Various item selection techniques are compared on criterion-referenced reliability and validity. Techniques compared include three nominal criterion-referenced methods, a traditional point biserial selection, teacher selection, and random selection. (Author)

Descriptors: Comparative Analysis, Criterion Referenced Tests, Item Analysis, Item Banks

Effects of Different Samples on Item and Test Characteristics of Criterion-Referenced Tests

Peer reviewed

Haladyna, Thomas Michael – Journal of Educational Measurement, 1974

Classical test construction and analysis procedures are applicable and appropriate for use with criterion referenced tests when samples of both mastery and nonmastery examinees are employed. (Author/BB)

Descriptors: Criterion Referenced Tests, Item Analysis, Mastery Tests, Test Construction

The Issue of Item and Test Variance for Criterion-Referenced Tests: A Reply

Peer reviewed

Woodson, M. I. Charles E. – Journal of Educational Measurement, 1974

The basis for selection of the calibration sample determines the kind of scale which will be developed. A random sample from a population of individuals leads to a norm-referenced scale, and a sample representative of abilities of a range of characteristics leads to a criterion-referenced scale. (Author/BB)

Descriptors: Criterion Referenced Tests, Discriminant Analysis, Item Analysis, Test Construction

The Use of Latent Partition Analysis to Identify Homogeneity of an Item Population

Peer reviewed

Hartke, Alan R. – Journal of Educational Measurement, 1978

Latent partition analysis is shown to be useful in determining the conceptual homogeneity of an item population. Such item populations are useful for mastery testing. Applications of latent partition analysis in assessing content validity are suggested. (Author/JKS)

Descriptors: Higher Education, Item Analysis, Item Sampling, Mastery Tests

A Comprehensive System for Item Analysis in Psychological Scale Construction

Peer reviewed

Schwartz, Steven A. – Journal of Educational Measurement, 1978

A method for the construction of scales which combines the rational (or intuitive) approach with an empirical (item analysis) approach is presented. A step-by-step procedure is provided. (Author/JKS)

Descriptors: Factor Analysis, Item Analysis, Measurement, Psychological Testing

A Note on the Comparability of Alternative Scoring Methods for the Institutional Functioning Inventory

Peer reviewed

Hartnett, Rodney T. – Journal of Educational Measurement, 1971

Alternative scoring methods yield essentially the same information, including scale intercorrelations and validity. Reasons for preferring the traditional psychometric scoring technique are offered. (Author/AG)

Descriptors: College Environment, Comparative Analysis, Correlation, Item Analysis

The Effects of Selected Poor Item-Writing Practices on Test Difficulty, Reliability and Validity

Peer reviewed

Board, Cynthia; Whitney, Douglas R. – Journal of Educational Measurement, 1972

For the principles studied here, poor item-writing practices serve to obscure (or attentuate) differences between good and poor students. (Authors)

Descriptors: College Students, Item Analysis, Multiple Choice Tests, Test Construction

A Monte Carlo Comparison of Ten Item Discrimination Indices.

Peer reviewed

Beuchert, A. Kent; Mendoza, Jorge L. – Journal of Educational Measurement, 1979

Ten item discrimination indices, across a variety of item analysis situations, were compared, based on the validities of tests constructed by using each of the indices to select 40 items from a 100-item pool. Item score data were generated by a computer program and included a simulation of guessing. (Author/CTM)

Descriptors: Item Analysis, Simulation, Statistical Analysis, Test Construction

Methodological Considerations in the Development of Indicators of Achievement in Data from the National Assessment.

Peer reviewed

Anderson, Ronald E.; And Others – Journal of Educational Measurement, 1982

Findings on alternative procedures for evaluating measures of achievement in individual data packages at the National Assessment of Educational Progress are presented with their methodological implications. The need for secondary analysts to be aware of the organization of the data, and positive and negative features are discussed. (Author/CM)

Descriptors: Achievement, Databases, Educational Assessment, Elementary Secondary Education

Previous Page | Next Page »

Pages: 1 | 2 | 3

Ahn, Meeyeon	1
Anderson, Ronald E.	1
Benson, Jeri	1
Betebenner, Damian	1
Beuchert, A. Kent	1
Board, Cynthia	1
Brandenburg, Dale C.	1
Chalifour, Clark L.	1
Crehan, Kevin D.	1
Crocker, Linda	1
Emrick, John A.	1
Falk, Carl F.	1
Frisbie, David A.	1
Haladyna, Thomas Michael	1
Hartke, Alan R.	1
Hartnett, Rodney T.	1
Hocevar, Dennis	1
Hoover, H. D.	1
Huynh, Huynh	1
Ju, Unhee	1
Kirsch, Irwin S.	1
McMorris, Robert F.	1
Medley, Donald M.	1
Mehrens, William A.	1
More ▼