Publication Date
In 2025 | 0 |
Since 2024 | 0 |
Since 2021 (last 5 years) | 0 |
Since 2016 (last 10 years) | 0 |
Since 2006 (last 20 years) | 3 |
Descriptor
Item Analysis | 40 |
Test Reliability | 40 |
Testing Problems | 40 |
Test Validity | 20 |
Test Items | 16 |
Test Construction | 14 |
Response Style (Tests) | 8 |
Test Interpretation | 8 |
Error of Measurement | 7 |
Multiple Choice Tests | 7 |
Scoring | 7 |
More ▼ |
Source
Author
Publication Type
Education Level
Elementary Secondary Education | 1 |
Audience
Practitioners | 2 |
Researchers | 2 |
Teachers | 1 |
Location
Laws, Policies, & Programs
Assessments and Surveys
What Works Clearinghouse Rating
Longford, Nicholas T. – Journal of Educational and Behavioral Statistics, 2014
A method for medical screening is adapted to differential item functioning (DIF). Its essential elements are explicit declarations of the level of DIF that is acceptable and of the loss function that quantifies the consequences of the two kinds of inappropriate classification of an item. Instead of a single level and a single function, sets of…
Descriptors: Test Items, Test Bias, Simulation, Hypothesis Testing
Phillips, Gary W. – Applied Measurement in Education, 2015
This article proposes that sampling design effects have potentially huge unrecognized impacts on the results reported by large-scale district and state assessments in the United States. When design effects are unrecognized and unaccounted for they lead to underestimating the sampling error in item and test statistics. Underestimating the sampling…
Descriptors: State Programs, Sampling, Research Design, Error of Measurement
Camilli, Gregory – Educational Research and Evaluation, 2013
In the attempt to identify or prevent unfair tests, both quantitative analyses and logical evaluation are often used. For the most part, fairness evaluation is a pragmatic attempt at determining whether procedural or substantive due process has been accorded to either a group of test takers or an individual. In both the individual and comparative…
Descriptors: Alternative Assessment, Test Bias, Test Content, Test Format

Burton, Richard F. – Assessment & Evaluation in Higher Education, 2001
Item-discrimination indices are numbers calculated from test data that are used in assessing the effectiveness of individual test questions. This article asserts that the indices are so unreliable as to suggest that countless good questions may have been discarded over the years. It considers how the indices, and hence overall test reliability,…
Descriptors: Guessing (Tests), Item Analysis, Test Reliability, Testing Problems

van den Wollenberg, Arnold L. – Psychometrika, 1982
Presently available test statistics for the Rasch model are shown to be insensitive to violations of the assumption of test unidimensionality. Two new statistics are presented. One is similar to available statistics, but with some improvements; the other addresses the problem of insensitivity to unidimensionality. (Author/JKS)
Descriptors: Item Analysis, Latent Trait Theory, Statistics, Test Reliability

Kuncel, Ruth Boutin; Fiske, Donald W. – Educational and Psychological Measurement, 1974
Four hypotheses regarding stability of response process and response in personality testing are tested and supported. (RC)
Descriptors: College Students, Item Analysis, Personality Measures, Response Style (Tests)

Andrulis, Richard S.; And Others – Educational and Psychological Measurement, 1978
The effects of repeaters (testees included in both administrations of two forms of a test) on the test equating process are examined. It is shown that repeaters do effect test equating and tend to lower the cutoff point for passing the test. (JKS)
Descriptors: Cutting Scores, Equated Scores, Item Analysis, Scoring
Strickland, Guy – 1970
This report summarizes the findings of Jackson and Lahadern who used a revised form of the Student Opinion Poll (SOP) and a questionnaire to study the intercorrelations of attitudes and achievement. The study found that: (1) first graders have attitudes toward school work but these attitudes were not differentiated toward specific school subjects;…
Descriptors: Achievement, Attitudes, Evaluation, Item Analysis

Rusch, Reuben; Steiner, Judith – Journal of Experimental Education, 1979
The Selected Marker Tests were examined for scoring problems and internal consistency and were administered orally to sixth and seventh graders. Scoring problems were discovered and changes were suggested. The problem was found to be item reliability rather than interrater reliability. (Author/MH)
Descriptors: Cognitive Tests, Elementary Education, Item Analysis, Problem Solving

Barnette, J. Jackson; And Others – Educational Research Quarterly, 1978
The DELPHI procedure requires respondents to reply to several questionnaire iterations with subsequent rounds containing previous round feedback. This study investigated the methodology (response rates, effects of feedback) and offered evidence that large-scale DELPHI surveys are not as advantageous as has previously been indicated. Suggestions…
Descriptors: Feedback, Item Analysis, Measurement Techniques, Predictive Measurement

Whitely, Susan E. – Journal of Educational Measurement, 1977
A debate concerning specific issues and the general usefulness of the Rasch latent trait test model is continued. Methods of estimation, necessary sample size, and the applicability of the model are discussed. (JKS)
Descriptors: Error of Measurement, Item Analysis, Mathematical Models, Measurement

Wright, Benjamin D. – Journal of Educational Measurement, 1977
Statements made in a previous article of this journal concerning the Rasch latent trait test model are questioned. Methods of estimation, necessary sample sizes, several formuli, and the general usefulness of the Rasch model are discussed. (JKS)
Descriptors: Computers, Error of Measurement, Item Analysis, Mathematical Models
Miller, Harry G.; Williams, Reed G. – Educational Technology, 1973
Descriptors: Content Analysis, Item Analysis, Measurement Techniques, Multiple Choice Tests
Linn, Robert – 1978
A series of studies on conceptual and design problems in competency-based measurements are explained. The concept of validity within the context of criterion-referenced measurement is reviewed. The authors believe validation should be viewed as a process rather than an end product. It is the process of marshalling evidence to support…
Descriptors: Criterion Referenced Tests, Item Analysis, Item Sampling, Test Bias
Bridgeman, Brent – 1974
This experiment was designed to assess the ability of item writers to construct truly parallel tests based on a "duplicate-construction experiment" in which Cronbach argues that if the universe description and sampling are ideally refined, the two independently constructed tests will be entirely equivalent, and that within the limits of item…
Descriptors: Criterion Referenced Tests, Error of Measurement, Item Analysis, Norm Referenced Tests