ERIC - Search Results

Publication Date

In 2025	0
Since 2024	0
Since 2021 (last 5 years)	0
Since 2016 (last 10 years)	0
Since 2006 (last 20 years)	3

Source

Educational and Psychological…	2
Applied Measurement in…	1
Applied Psychological…	1
International Journal of…	1
Perceptual and Motor Skills	1
Psychological Methods	1
Research in the Schools	1

Author

Sijtsma, Klaas	2
Alsawalmeh, Yousef M.	1
Cheng, Ying-Yao	1
Clements, Andrea D.	1
Emons, Wilco H. M.	1
Feldt, Leonard S.	1
Fitzpatrick, Anne R.	1
Ho, Yi-Hui	1
Lewis, Charles	1
Livingston, Samuel A.	1
Mayer, John D.	1
Meijer, Rob R.	1
Rothenberg, Lori	1
Stone, Clement A.	1
Wang, Wen-Chung	1
Yen, Wendy M.	1
More ▼

Publication Type

Reports - Evaluative	9
Journal Articles	8
Speeches/Meeting Papers	1

Education Level

Audience

Location

Taiwan

Laws, Policies, & Programs

Assessments and Surveys

What Works Clearinghouse Rating

Showing all 9 results Save | Export

Correcting Fallacies in Validity, Reliability, and Classification

Peer reviewed

Direct link

Sijtsma, Klaas – International Journal of Testing, 2009

This article reviews three topics from test theory that continue to raise discussion and controversy and capture test theorists' and constructors' interest. The first topic concerns the discussion of the methodology of investigating and establishing construct validity; the second topic concerns reliability and its misuse, alternative definitions…

Descriptors: Construct Validity, Reliability, Classification, Test Theory

Multidimensional Rasch Analysis of a Psychological Test with Multiple Subtests: A Statistical Solution for the Bandwidth-Fidelity Dilemma

Peer reviewed

Direct link

Cheng, Ying-Yao; Wang, Wen-Chung; Ho, Yi-Hui – Educational and Psychological Measurement, 2009

Educational and psychological tests are often composed of multiple short subtests, each measuring a distinct latent trait. Unfortunately, short subtests suffer from low measurement precision, which makes the bandwidth-fidelity dilemma inevitable. In this study, the authors demonstrate how a multidimensional Rasch analysis can be employed to take…

Descriptors: Item Response Theory, Measurement, Correlation, Measures (Individuals)

Testing the Equality of Independent Alpha Coefficients Adjusted for Test Length.

Peer reviewed

Alsawalmeh, Yousef M.; Feldt, Leonard S. – Educational and Psychological Measurement, 1999

Develops a statistical test for the hypothesis that alpha'(1) =alpha'(2) when alpha'(1) is the Spearman-Brown extrapolated value of Cronbach's alpha reliability for test 1 and alpha'(2) is the unadjusted coefficient for test 2. The test is shown to exercise tight control of Type I error. (Author/SLD)

Descriptors: Reliability, Test Length

On the Consistency of Individual Classification Using Short Scales

Peer reviewed

Direct link

Emons, Wilco H. M.; Sijtsma, Klaas; Meijer, Rob R. – Psychological Methods, 2007

Short tests containing at most 15 items are used in clinical and health psychology, medicine, and psychiatry for making decisions about patients. Because short tests have large measurement error, the authors ask whether they are reliable enough for classifying patients into a treatment and a nontreatment group. For a given certainty level,…

Descriptors: Psychiatry, Patients, Error of Measurement, Test Length

Estimating the Sampling Variance of Correlation Corrected for Attenuation Using Coefficient Alpha.

Peer reviewed

Mayer, John D. – Perceptual and Motor Skills, 1983

Kelly's formula estimates sampling variance of correlation corrected for attenuation by using split-half reliabilities. In some cases, coefficient alpha estimate of reliability is preferable. A simulation study suggests a variation of Kelly's formula can be used appropriately with coefficient alpha. Kelly's formula is modified to accept…

Descriptors: Correlation, Measurement Techniques, Reliability, Sampling

The Effects of Test Length and Sample Size on the Reliability and Equating of Tests Composed of Constructed-Response Items.

Peer reviewed

Fitzpatrick, Anne R.; Yen, Wendy M. – Applied Measurement in Education, 2001

Examined the effects of test length and sample size on the alternate forms reliability and equating of simulated mathematics tests composed of constructed response items scaled using the two-parameter partial credit model. Results suggest that, to obtain acceptable reliabilities and accurate equated scores, tests should have at least 8 6-point…

Descriptors: Constructed Response, Equated Scores, Mathematics Tests, Reliability

Estimating the Consistency and Accuracy of Classifications Based on Test Scores.

Download full text

Livingston, Samuel A.; Lewis, Charles – 1993

This paper presents a method for estimating the accuracy and consistency of classifications based on test scores. The scores can be produced by any scoring method, including the formation of a weighted composite. The estimates use data from a single form. The reliability of the score is used to estimate its effective test length in terms of…

Descriptors: Classification, Error of Measurement, Estimation (Mathematics), Reliability

Testing at Higher Taxonomic Levels: Are We Jeopardizing Reliability by Increasing the Emphasis on Complexity?

Clements, Andrea D.; Rothenberg, Lori – Research in the Schools, 1996

Undergraduate psychology examinations from 48 schools were analyzed to determine the proportion of items at each level of Bloom's Taxonomy, item format, and test length. Analyses indicated significant relationships between item complexity and test length even when taking format into account. Use of higher items may be related to shorter tests,…

Descriptors: Classification, Difficulty Level, Educational Objectives, Higher Education

Recovery of Marginal Maximum Likelihood Estimates in the Two-Parameter Logistic Response Model: An Evaluation of MULTILOG.

Peer reviewed

Stone, Clement A. – Applied Psychological Measurement, 1992

Monte Carlo methods are used to evaluate marginal maximum likelihood estimation of item parameters and maximum likelihood estimates of theta in the two-parameter logistic model for varying test lengths, sample sizes, and assumed theta distributions. Results with 100 datasets demonstrate the methods' general precision and stability. Exceptions are…

Descriptors: Computer Software Evaluation, Estimation (Mathematics), Mathematical Models, Maximum Likelihood Statistics

Reliability	9
Test Length	9
Classification	4
Error of Measurement	3
Correlation	2
Estimation (Mathematics)	2
Item Response Theory	2
Sample Size	2
Test Construction	2
Test Items	2
Undergraduate Students	2
Cognitive Style	1
Computer Software Evaluation	1
Construct Validity	1
Constructed Response	1
Decision Making	1
Difficulty Level	1
Educational Objectives	1
Equated Scores	1
Foreign Countries	1
Generalizability Theory	1
High Stakes Tests	1
Higher Education	1
Item Analysis	1
Mathematical Models	1
More ▼