ERIC - Search Results

Publication Date

In 2025	0
Since 2024	0
Since 2021 (last 5 years)	1
Since 2016 (last 10 years)	6
Since 2006 (last 20 years)	12

Descriptor

Correlation	29
Test Reliability	29
Test Theory	29
Test Validity	13
Item Response Theory	8
Statistical Analysis	8
Scores	7
Comparative Analysis	6
True Scores	6
Mathematical Models	5
Error of Measurement	4
Item Analysis	4
Psychometrics	4
Test Construction	4
Career Development	3
Computer Assisted Testing	3
English (Second Language)	3
Higher Education	3
Interrater Reliability	3
Measurement Techniques	3
Measures (Individuals)	3
Test Items	3
Accuracy	2
Analysis of Variance	2
College Students	2
More ▼

Source

Educational and Psychological…	4
Psychometrika	3
Journal of Educational…	2
Journal of Experimental…	2
Applied Psychological…	1
Educational Assessment	1
Educational Testing Service	1
Gerontologist	1
International Journal of…	1
Journal of Computer-Based…	1
Journal of Interactive Online…	1
Language Teaching Research…	1
Language Testing	1
Measurement and Evaluation in…	1
Physical Review Physics…	1
Research Papers in Education	1
More ▼

Publication Type

Reports - Research	22
Journal Articles	20
Speeches/Meeting Papers	4
Reports - Evaluative	3
Reports - Descriptive	2
Dissertations/Theses -…	1
Opinion Papers	1

Education Level

Higher Education	3
Postsecondary Education	3
Elementary Education	2
Adult Education	1

Audience

Researchers

Location

Colorado	1
Singapore	1
United Kingdom (England)	1

Laws, Policies, & Programs

Assessments and Surveys

Defining Issues Test	1
New Jersey College Basic…	1
Test of English as a Foreign…	1

What Works Clearinghouse Rating

Showing 1 to 15 of 29 results Save | Export

Using Differential Item Functioning to Test for Interrater Reliability in Constructed Response Items

Peer reviewed

Direct link

Walker, Cindy M.; Göçer Sahin, Sakine – Educational and Psychological Measurement, 2020

The purpose of this study was to investigate a new way of evaluating interrater reliability that can allow one to determine if two raters differ with respect to their rating on a polytomous rating scale or constructed response item. Specifically, differential item functioning (DIF) analyses were used to assess interrater reliability and compared…

Descriptors: Test Bias, Interrater Reliability, Responses, Correlation

Concurrent Validity of LLAMA_F: Measure of Language Analytic Ability as a Predictor of Morphosyntax Knowledge

Peer reviewed
PDF on ERIC

Download full text

Kim, Peter – Language Teaching Research Quarterly, 2021

Foreign language aptitude is defined as one's potential to learn a second language. A language learner with higher aptitude is predicted to learn more, faster, and reach a higher level of proficiency. If this is the case, one way to validate the construct of aptitude and its measure is to conduct a validation study in which measures of aptitude is…

Descriptors: Morphology (Languages), Syntax, Second Language Learning, Second Language Instruction

Item Response Theory: An Introduction to Latent Trait Models to Test and Item Development

Peer reviewed
PDF on ERIC

Download full text

Bichi, Ado Abdu; Talib, Rohaya – International Journal of Evaluation and Research in Education, 2018

Testing in educational system perform a number of functions, the results from a test can be used to make a number of decisions in education. It is therefore well accepted in the education literature that, testing is an important element of education. To effectively utilize the tests in educational policies and quality assurance its validity and…

Descriptors: Item Response Theory, Test Items, Test Construction, Decision Making

A Comparison of Reliability and Precision of Subscore Reporting Methods for a State English Language Proficiency Assessment

Peer reviewed

Direct link

Longabach, Tanya; Peyton, Vicki – Language Testing, 2018

K-12 English language proficiency tests that assess multiple content domains (e.g., listening, speaking, reading, writing) often have subsections based on these content domains; scores assigned to these subsections are commonly known as subscores. Testing programs face increasing customer demands for the reporting of subscores in addition to the…

Descriptors: Comparative Analysis, Test Reliability, Second Language Learning, Language Proficiency

Test Assembly Implications for Providing Reliable and Valid Subscores

Peer reviewed

Direct link

Lee, Minji K.; Sweeney, Kevin; Melican, Gerald J. – Educational Assessment, 2017

This study investigates the relationships among factor correlations, inter-item correlations, and the reliability estimates of subscores, providing a guideline with respect to psychometric properties of useful subscores. In addition, it compares subscore estimation methods with respect to reliability and distinctness. The subscore estimation…

Descriptors: Scores, Test Construction, Test Reliability, Test Validity

Students' Epistemologies about Experimental Physics: Validating the Colorado Learning Attitudes about Science Survey for Experimental Physics

Peer reviewed

Direct link

Wilcox, Bethany R.; Lewandowski, H. J. – Physical Review Physics Education Research, 2016

Student learning in instructional physics labs represents a growing area of research that includes investigations of students' beliefs and expectations about the nature of experimental physics. To directly probe students' epistemologies about experimental physics and support broader lab transformation efforts at the University of Colorado Boulder…

Descriptors: Physics, Epistemology, Surveys, Science Instruction

Classification Accuracy in Key Stage 2 National Curriculum Tests in England

Peer reviewed

Direct link

He, Qingping; Hayes, Malcolm; Wiliam, Dylan – Research Papers in Education, 2013

The accuracy of the results of the national tests in English, mathematics and science taken by 11-year olds in England has been a matter of much debate since their introduction in 1994, with estimates of the proportion of students incorrectly classified varying from 10 to 30%. Using live data from the 2009 and 2010 administration of the national…

Descriptors: Foreign Countries, National Curriculum, Accuracy, Classification

An Analysis of Cross Racial Identity Scale Scores Using Classical Test Theory and Rasch Item Response Models

Peer reviewed

Direct link

Sussman, Joshua; Beaujean, A. Alexander; Worrell, Frank C.; Watson, Stevie – Measurement and Evaluation in Counseling and Development, 2013

Item response models (IRMs) were used to analyze Cross Racial Identity Scale (CRIS) scores. Rasch analysis scores were compared with classical test theory (CTT) scores. The partial credit model demonstrated a high goodness of fit and correlations between Rasch and CTT scores ranged from 0.91 to 0.99. CRIS scores are supported by both methods.…

Descriptors: Item Response Theory, Test Theory, Measures (Individuals), Racial Identification

Development of Short Versions for the WHOQOL-OLD Module

Peer reviewed

Direct link

Fang, Jiqian; Power, Mick; Lin, Yueqing; Zhang, Jinxin; Hao, Yuantao; Chatterji, Somnath – Gerontologist, 2012

Purpose of the study: To explore short-form versions of World Health Organization Quality of Life (WHOQOL-OLD) with acceptable psychometric properties, which was developed for older adults by the WHOQOL research group, containing 24 items initially. Design and Methods: We randomly sampled two-thirds of respondents from the data of WHOQOL-OLD field…

Descriptors: Quality of Life, Test Reliability, Correlation, Psychometrics

Use of e-rater[R] in Scoring of the TOEFL iBT[R] Writing Test. Research Report. ETS RR-11-25

Download full text

Haberman, Shelby J. – Educational Testing Service, 2011

Alternative approaches are discussed for use of e-rater[R] to score the TOEFL iBT[R] Writing test. These approaches involve alternate criteria. In the 1st approach, the predicted variable is the expected rater score of the examinee's 2 essays. In the 2nd approach, the predicted variable is the expected rater score of 2 essay responses by the…

Descriptors: Writing Tests, Scoring, Essays, Language Tests

A Stochastic Model for Test-Retest Correlations.

Peer reviewed

Morrison, Donald G. – Psychometrika, 1981

A simple stochastic model is formulated in order to determine the optimal time between the first test and the second test when the test-retest method of assessing reliability is used. A forgetting process and a change in true score process are postulated. Some numerical examples and suggestions are presented. (Author/JKS)

Descriptors: Correlation, Test Reliability, Test Theory, True Scores

Reliability of Total Test Scores When Considered as Ordinal Measurements

Peer reviewed

Direct link

Biswas, Ajoy Kumar – Applied Psychological Measurement, 2006

This article studies the ordinal reliability of (total) test scores. This study is based on a classical-type linear model of observed score (X), true score (T), and random error (E). Based on the idea of Kendall's tau-a coefficient, a measure of ordinal reliability for small-examinee populations is developed. This measure is extended to large…

Descriptors: True Scores, Test Theory, Test Reliability, Scores

Spearman's Test Score Model: A Restatement

Peer reviewed

Ng, K. T. – Educational and Psychological Measurement, 1974

This paper is aimed at demonstrating that Charles Spearman postulated neither a platonic true-error distinction nor a requirement for constant true scores under repeated measurement. (Author/RC)

Descriptors: Career Development, Correlation, Models, Test Reliability

Elimination of Bias in Text Scores: Effect on Reliability and Validity.

Peer reviewed

Frary, Robert B.; Zimmerman, Donald W. – Educational and Psychological Measurement, 1984

The correlation between bias components of test scores and unbiased observed scores is shown to be an effective predictor of changes in reliability and validity resulting from elimination of bias. Plausible assumptions about value of correlation and size of related variance components indicate that reducation in reliability and validity is a…

Descriptors: Correlation, Scores, Test Bias, Test Reliability

Reconsideration of the "Attenuation Paradox"--and Some New Paradoxes in Test Validity.

Peer reviewed

Williams, Richard H.; Zimmerman, Donald W. – Journal of Experimental Education, 1982

A mathematical link between test reliability and test validity is derived, taking into account the correlation between error scores on a test and error scores on a criterion measure. When this correlation is positive, the "paradoxical" nonmonotonic relation between test reliability and test validity occurs universally. (Author/BW)

Descriptors: Correlation, Error of Measurement, Mathematical Models, Test Reliability

Previous Page | Next Page »

Pages: 1 | 2

Zimmerman, Donald W.	4
Williams, Richard H.	2
Abedi, Jamal	1
Ackerman, Terry A.	1
Beaujean, A. Alexander	1
Belfry, M. Joan	1
Bichi, Ado Abdu	1
Biswas, Ajoy Kumar	1
Bruno, James	1
Budescu, David	1
Chatterji, Somnath	1
Fang, Jiqian	1
Frary, Robert B.	1
Gamache, LeAnn M.	1
Göçer Sahin, Sakine	1
Haberman, Shelby J.	1
Hao, Yuantao	1
Hayes, Malcolm	1
He, Qingping	1
Iran-Nejad, Asghar	1
Kim, Peter	1
Kraemer, Helena Chmura	1
Lee, Minji K.	1
Lewandowski, H. J.	1
More ▼