ERIC - Search Results

Publication Date

In 2025	0
Since 2024	1
Since 2021 (last 5 years)	1
Since 2016 (last 10 years)	2
Since 2006 (last 20 years)	13

Descriptor

Test Reliability	34
Test Theory	34
Test Validity	11
Error of Measurement	9
Scores	9
Test Items	7
Estimation (Mathematics)	6
Evaluation Methods	6
Item Response Theory	6
Psychometrics	6
Test Construction	6
Foreign Countries	5
Generalizability Theory	5
Difficulty Level	4
Item Analysis	4
Measurement Techniques	4
Testing	4
Achievement Tests	3
Comparative Analysis	3
Correlation	3
Educational Assessment	3
Elementary Secondary Education	3
Equated Scores	3
Evaluation Research	3
Interrater Reliability	3
More ▼

Source

Educational Measurement:…	3
Educational and Psychological…	3
Applied Psychological…	2
Research Papers in Education	2
Advances in Health Sciences…	1
Asia Pacific Journal of…	1
Assessment & Evaluation in…	1
Educational Evaluation and…	1
IEEE Transactions on Education	1
Journal of Educational…	1
Journal of Educational…	1
Journal on Educational…	1
Psychometrika	1
Review of Educational Research	1
Review of Research in…	1
School Psychology Review	1
Social Indicators Research	1
Taaltoetsen: Toegepaste…	1
More ▼

Publication Type

Reports - Evaluative	34
Journal Articles	24
Speeches/Meeting Papers	10
Information Analyses	2
Opinion Papers	1

Education Level

Higher Education	3
Adult Education	1
Elementary Education	1
Postsecondary Education	1
Secondary Education	1

Audience

Researchers

Location

United Kingdom (England)	2
Canada	1
Finland (Helsinki)	1
Singapore	1

Laws, Policies, & Programs

Assessments and Surveys

ACT Assessment	1
Expressive One Word Picture…	1
Graduate Record Examinations	1
Preliminary Scholastic…	1
SAT (College Admission Test)	1

What Works Clearinghouse Rating

Showing 1 to 15 of 34 results Save | Export

Programme Evaluation in Action: Theory to Practice from an Asian Educational Context

Peer reviewed

Direct link

Ser Ming Mark Lee; Wei Cheng Liu – Asia Pacific Journal of Education, 2024

Programme evaluation has developed tremendously over the past 50 years, with a proliferation of evaluation research, an increase in the institutionalization of evaluation, and growth in the professionalization of evaluation. However, existing research and developments are still largely in North America, Europe, Australia, and New Zealand, with…

Descriptors: Foreign Countries, Evaluation Research, Evaluation Methods, Evaluation Criteria

A Design for Comparing CTT and IRT in Test Assembly, Scoring and Argumentation: Differences among Reliability, Information and Validation

Peer reviewed

Direct link

Alqarni, Abdulelah Mohammed – Journal on Educational Psychology, 2019

This study compares the psychometric properties of reliability in Classical Test Theory (CTT), item information in Item Response Theory (IRT), and validation from the perspective of modern validity theory for the purpose of bringing attention to potential issues that might exist when testing organizations use both test theories in the same testing…

Descriptors: Test Theory, Item Response Theory, Test Construction, Scoring

Problems in Estimating Composite Reliability of "Unitised" Assessments

Peer reviewed

Direct link

Bramley, Tom; Dhawan, Vikas – Research Papers in Education, 2013

This paper discusses the issues involved in calculating indices of composite reliability for "modular" or "unitised" assessments of the kind used in GCSEs, AS and A level examinations in England. The increasingly widespread use of on-screen marking has meant that the item-level data required for calculating indices of…

Descriptors: Foreign Countries, Exit Examinations, Secondary Education, Test Reliability

Classification Accuracy in Key Stage 2 National Curriculum Tests in England

Peer reviewed

Direct link

He, Qingping; Hayes, Malcolm; Wiliam, Dylan – Research Papers in Education, 2013

The accuracy of the results of the national tests in English, mathematics and science taken by 11-year olds in England has been a matter of much debate since their introduction in 1994, with estimates of the proportion of students incorrectly classified varying from 10 to 30%. Using live data from the 2009 and 2010 administration of the national…

Descriptors: Foreign Countries, National Curriculum, Accuracy, Classification

The Contestant Perspective on Taking Tests: Emanations from the Statue within

Peer reviewed

Direct link

Dorans, Neil J. – Educational Measurement: Issues and Practice, 2012

Views on testing--its purpose and uses and how its data are analyzed--are related to one's perspective on test takers. Test takers can be viewed as learners, examinees, or contestants. I briefly discuss the perspective of test takers as learners. I maintain that much of psychometrics views test takers as examinees. I discuss test takers as a…

Descriptors: Testing, Test Theory, Item Response Theory, Test Reliability

The Number of Feedbacks Needed for Reliable Evaluation. A Multilevel Analysis of the Reliability, Stability and Generalisability of Students' Evaluation of Teaching

Peer reviewed

Direct link

Rantanen, Pekka – Assessment & Evaluation in Higher Education, 2013

A multilevel analysis approach was used to analyse students' evaluation of teaching (SET). The low value of inter-rater reliability stresses that any solid conclusions on teaching cannot be made on the basis of single feedbacks. To assess a teacher's general teaching effectiveness, one needs to evaluate four randomly chosen course implementations.…

Descriptors: Test Reliability, Feedback (Response), Generalizability Theory, Student Evaluation of Teacher Performance

A Control Systems Concept Inventory Test Design and Assessment

Peer reviewed

Direct link

Bristow, M.; Erkorkmaz, K.; Huissoon, J. P.; Jeon, Soo; Owen, W. S.; Waslander, S. L.; Stubley, G. D. – IEEE Transactions on Education, 2012

Any meaningful initiative to improve the teaching and learning in introductory control systems courses needs a clear test of student conceptual understanding to determine the effectiveness of proposed methods and activities. The authors propose a control systems concept inventory. Development of the inventory was collaborative and iterative. The…

Descriptors: Diagnostic Tests, Concept Formation, Undergraduate Students, Engineering Education

Adaptations and Access to Assessment of Common Core Content

Peer reviewed

Direct link

Kettler, Ryan J. – Review of Research in Education, 2015

This chapter introduces theory that undergirds the role of testing adaptations in assessment, provides examples of item modifications and testing accommodations, reviews research relevant to each, and introduces a new paradigm that incorporates opportunity to learn (OTL), academic enablers, testing adaptations, and inferences that can be made from…

Descriptors: Meta Analysis, Literature Reviews, Testing, Testing Accommodations

Are Multiple Choice Tests Fair to Medical Students with Specific Learning Disabilities?

Peer reviewed

Direct link

Ricketts, Chris; Brice, Julie; Coombes, Lee – Advances in Health Sciences Education, 2010

The purpose of multiple choice tests of medical knowledge is to estimate as accurately as possible a candidate's level of knowledge. However, concern is sometimes expressed that multiple choice tests may also discriminate in undesirable and irrelevant ways, such as between minority ethnic groups or by sex of candidates. There is little literature…

Descriptors: Medical Students, Testing Accommodations, Ethnic Groups, Learning Disabilities

Reliability Reporting Practices in Youth Life Satisfaction Research

Peer reviewed

Direct link

Vassar, Matt; Hale, William – Social Indicators Research, 2007

Due to the emergence of positive psychology in recent years, a growing line of research has focused on aspects of psychological wellness rather than psychopathology. Within the context of positive psychology, life satisfaction has emerged as a key variable of study in relation to adult and youth populations. Accurate measurement of life…

Descriptors: Life Satisfaction, Test Reliability, Psychopathology, Psychometrics

Evaluating Alignment between Curriculum, Assessment, and Instruction

Peer reviewed

Direct link

Martone, Andrea; Sireci, Stephen G. – Review of Educational Research, 2009

The authors (a) discuss the importance of alignment for facilitating proper assessment and instruction, (b) describe the three most common methods for evaluating the alignment between state content standards and assessments, (c) discuss the relative strengths and limitations of these methods, and (d) discuss examples of applications of each…

Descriptors: Teaching Methods, Alignment (Education), Student Evaluation, Curriculum Development

Subscores Based on Classical Test Theory: To Report or Not to Report

Peer reviewed

Direct link

Sinharay, Sandip; Haberman, Shelby; Puhan, Gautam – Educational Measurement: Issues and Practice, 2007

There is an increasing interest in reporting subscores, both at examinee level and at aggregate levels. However, it is important to ensure reasonable subscore performance in terms of high reliability and validity to minimize incorrect instructional and remediation decisions. This article employs a statistical measure based on classical test theory…

Descriptors: Test Reliability, Test Theory, Test Validity, Statistical Analysis

Why Generalizability Theory Yields Better Results than Classical Test Theory.

Download full text

Eason, Sandra H. – 1989

Generalizability theory provides a technique for accurately estimating the reliability of measurements. The power of this theory is based on the simultaneous analysis of multiple sources of error variances. Equally important, generalizability theory considers relationships among the sources of measurement error. Just as multivariate inferential…

Descriptors: Comparative Analysis, Generalizability Theory, Test Reliability, Test Theory

Reliability of Total Test Scores When Considered as Ordinal Measurements

Peer reviewed

Direct link

Biswas, Ajoy Kumar – Applied Psychological Measurement, 2006

This article studies the ordinal reliability of (total) test scores. This study is based on a classical-type linear model of observed score (X), true score (T), and random error (E). Based on the idea of Kendall's tau-a coefficient, a measure of ordinal reliability for small-examinee populations is developed. This measure is extended to large…

Descriptors: True Scores, Test Theory, Test Reliability, Scores

On the Reliability of Categorically Scored Examinations

Peer reviewed

Direct link

Kupermintz, Haggai – Journal of Educational Measurement, 2004

A decision-theoretic approach to the question of reliability in categorically scored examinations is explored. The concepts of true scores and errors are discussed as they deviate from conventional psychometric definitions and measurement error in categorical scores is cast in terms of misclassifications. A reliability measure based on…

Descriptors: Test Reliability, Error of Measurement, Psychometrics, Test Theory

Previous Page | Next Page »

Pages: 1 | 2 | 3

Alqarni, Abdulelah Mohammed	1
Altepeter, Tom	1
Armstrong, Ronald D.	1
Arnold, Margery E.	1
Biswas, Ajoy Kumar	1
Bramley, Tom	1
Brennan, Robert L.	1
Brice, Julie	1
Bristow, M.	1
Bullock, Cheryl Davis	1
Cahan, Sorel	1
Coombes, Lee	1
Crowley, Susan	1
DeStefano, Lizanne	1
Dhawan, Vikas	1
Dorans, Neil J.	1
Eason, Sandra H.	1
Erkorkmaz, K.	1
Gonzalez-Tamayo, Eulogio	1
Goulden, Nancy Rost	1
Gustafsson, Jan-Eric	1
Haberman, Shelby	1
Hale, William	1
Hayes, Malcolm	1
More ▼