Publication Date
In 2025 | 0 |
Since 2024 | 0 |
Since 2021 (last 5 years) | 1 |
Since 2016 (last 10 years) | 6 |
Since 2006 (last 20 years) | 12 |
Descriptor
Correlation | 29 |
Test Reliability | 29 |
Test Theory | 29 |
Test Validity | 13 |
Item Response Theory | 8 |
Statistical Analysis | 8 |
Scores | 7 |
Comparative Analysis | 6 |
True Scores | 6 |
Mathematical Models | 5 |
Error of Measurement | 4 |
More ▼ |
Source
Author
Publication Type
Reports - Research | 22 |
Journal Articles | 20 |
Speeches/Meeting Papers | 4 |
Reports - Evaluative | 3 |
Reports - Descriptive | 2 |
Dissertations/Theses -… | 1 |
Opinion Papers | 1 |
Education Level
Higher Education | 3 |
Postsecondary Education | 3 |
Elementary Education | 2 |
Adult Education | 1 |
Audience
Researchers | 2 |
Location
Colorado | 1 |
Singapore | 1 |
United Kingdom (England) | 1 |
Laws, Policies, & Programs
Assessments and Surveys
Defining Issues Test | 1 |
New Jersey College Basic… | 1 |
Test of English as a Foreign… | 1 |
What Works Clearinghouse Rating
Using Differential Item Functioning to Test for Interrater Reliability in Constructed Response Items
Walker, Cindy M.; Göçer Sahin, Sakine – Educational and Psychological Measurement, 2020
The purpose of this study was to investigate a new way of evaluating interrater reliability that can allow one to determine if two raters differ with respect to their rating on a polytomous rating scale or constructed response item. Specifically, differential item functioning (DIF) analyses were used to assess interrater reliability and compared…
Descriptors: Test Bias, Interrater Reliability, Responses, Correlation
Kim, Peter – Language Teaching Research Quarterly, 2021
Foreign language aptitude is defined as one's potential to learn a second language. A language learner with higher aptitude is predicted to learn more, faster, and reach a higher level of proficiency. If this is the case, one way to validate the construct of aptitude and its measure is to conduct a validation study in which measures of aptitude is…
Descriptors: Morphology (Languages), Syntax, Second Language Learning, Second Language Instruction
Bichi, Ado Abdu; Talib, Rohaya – International Journal of Evaluation and Research in Education, 2018
Testing in educational system perform a number of functions, the results from a test can be used to make a number of decisions in education. It is therefore well accepted in the education literature that, testing is an important element of education. To effectively utilize the tests in educational policies and quality assurance its validity and…
Descriptors: Item Response Theory, Test Items, Test Construction, Decision Making
Longabach, Tanya; Peyton, Vicki – Language Testing, 2018
K-12 English language proficiency tests that assess multiple content domains (e.g., listening, speaking, reading, writing) often have subsections based on these content domains; scores assigned to these subsections are commonly known as subscores. Testing programs face increasing customer demands for the reporting of subscores in addition to the…
Descriptors: Comparative Analysis, Test Reliability, Second Language Learning, Language Proficiency
Lee, Minji K.; Sweeney, Kevin; Melican, Gerald J. – Educational Assessment, 2017
This study investigates the relationships among factor correlations, inter-item correlations, and the reliability estimates of subscores, providing a guideline with respect to psychometric properties of useful subscores. In addition, it compares subscore estimation methods with respect to reliability and distinctness. The subscore estimation…
Descriptors: Scores, Test Construction, Test Reliability, Test Validity
Wilcox, Bethany R.; Lewandowski, H. J. – Physical Review Physics Education Research, 2016
Student learning in instructional physics labs represents a growing area of research that includes investigations of students' beliefs and expectations about the nature of experimental physics. To directly probe students' epistemologies about experimental physics and support broader lab transformation efforts at the University of Colorado Boulder…
Descriptors: Physics, Epistemology, Surveys, Science Instruction
He, Qingping; Hayes, Malcolm; Wiliam, Dylan – Research Papers in Education, 2013
The accuracy of the results of the national tests in English, mathematics and science taken by 11-year olds in England has been a matter of much debate since their introduction in 1994, with estimates of the proportion of students incorrectly classified varying from 10 to 30%. Using live data from the 2009 and 2010 administration of the national…
Descriptors: Foreign Countries, National Curriculum, Accuracy, Classification
Sussman, Joshua; Beaujean, A. Alexander; Worrell, Frank C.; Watson, Stevie – Measurement and Evaluation in Counseling and Development, 2013
Item response models (IRMs) were used to analyze Cross Racial Identity Scale (CRIS) scores. Rasch analysis scores were compared with classical test theory (CTT) scores. The partial credit model demonstrated a high goodness of fit and correlations between Rasch and CTT scores ranged from 0.91 to 0.99. CRIS scores are supported by both methods.…
Descriptors: Item Response Theory, Test Theory, Measures (Individuals), Racial Identification
Fang, Jiqian; Power, Mick; Lin, Yueqing; Zhang, Jinxin; Hao, Yuantao; Chatterji, Somnath – Gerontologist, 2012
Purpose of the study: To explore short-form versions of World Health Organization Quality of Life (WHOQOL-OLD) with acceptable psychometric properties, which was developed for older adults by the WHOQOL research group, containing 24 items initially. Design and Methods: We randomly sampled two-thirds of respondents from the data of WHOQOL-OLD field…
Descriptors: Quality of Life, Test Reliability, Correlation, Psychometrics
Haberman, Shelby J. – Educational Testing Service, 2011
Alternative approaches are discussed for use of e-rater[R] to score the TOEFL iBT[R] Writing test. These approaches involve alternate criteria. In the 1st approach, the predicted variable is the expected rater score of the examinee's 2 essays. In the 2nd approach, the predicted variable is the expected rater score of 2 essay responses by the…
Descriptors: Writing Tests, Scoring, Essays, Language Tests

Morrison, Donald G. – Psychometrika, 1981
A simple stochastic model is formulated in order to determine the optimal time between the first test and the second test when the test-retest method of assessing reliability is used. A forgetting process and a change in true score process are postulated. Some numerical examples and suggestions are presented. (Author/JKS)
Descriptors: Correlation, Test Reliability, Test Theory, True Scores
Biswas, Ajoy Kumar – Applied Psychological Measurement, 2006
This article studies the ordinal reliability of (total) test scores. This study is based on a classical-type linear model of observed score (X), true score (T), and random error (E). Based on the idea of Kendall's tau-a coefficient, a measure of ordinal reliability for small-examinee populations is developed. This measure is extended to large…
Descriptors: True Scores, Test Theory, Test Reliability, Scores

Ng, K. T. – Educational and Psychological Measurement, 1974
This paper is aimed at demonstrating that Charles Spearman postulated neither a platonic true-error distinction nor a requirement for constant true scores under repeated measurement. (Author/RC)
Descriptors: Career Development, Correlation, Models, Test Reliability

Frary, Robert B.; Zimmerman, Donald W. – Educational and Psychological Measurement, 1984
The correlation between bias components of test scores and unbiased observed scores is shown to be an effective predictor of changes in reliability and validity resulting from elimination of bias. Plausible assumptions about value of correlation and size of related variance components indicate that reducation in reliability and validity is a…
Descriptors: Correlation, Scores, Test Bias, Test Reliability

Williams, Richard H.; Zimmerman, Donald W. – Journal of Experimental Education, 1982
A mathematical link between test reliability and test validity is derived, taking into account the correlation between error scores on a test and error scores on a criterion measure. When this correlation is positive, the "paradoxical" nonmonotonic relation between test reliability and test validity occurs universally. (Author/BW)
Descriptors: Correlation, Error of Measurement, Mathematical Models, Test Reliability
Previous Page | Next Page »
Pages: 1 | 2