ERIC - Search Results

Publication Date

In 2025	0
Since 2024	1
Since 2021 (last 5 years)	2
Since 2016 (last 10 years)	3
Since 2006 (last 20 years)	13

Descriptor

Evaluation Methods	27
Test Reliability	27
Test Theory	27
Test Validity	14
Student Evaluation	10
Psychometrics	7
Error of Measurement	5
Foreign Countries	5
Item Response Theory	5
Test Construction	5
Testing	5
Educational Research	4
Interrater Reliability	4
Scores	4
Statistical Analysis	4
Test Bias	4
Test Interpretation	4
Evaluation Research	3
Higher Education	3
Mathematical Models	3
Measures (Individuals)	3
Science Instruction	3
Scientific Concepts	3
Scoring	3
Test Items	3
More ▼

Publication Type

Journal Articles	20
Reports - Research	13
Reports - Evaluative	6
Information Analyses	3
Reports - Descriptive	3
Opinion Papers	2
Speeches/Meeting Papers	2
Dissertations/Theses -…	1
Guides - Non-Classroom	1
Reference Materials -…	1

Education Level

Higher Education	4
Postsecondary Education	3
Adult Education	2
Elementary Secondary Education	2
High Schools	2
Junior High Schools	1
Middle Schools	1
Secondary Education	1

Audience

Practitioners	1
Teachers	1

Location

Australia	1
Canada	1
Egypt	1
Finland (Helsinki)	1
Oregon	1
Singapore	1
United Kingdom (England)	1
United States	1

Laws, Policies, & Programs

Elementary and Secondary…

Assessments and Surveys

What Works Clearinghouse Rating

Showing 1 to 15 of 27 results Save | Export

Comparison of the Results of the Generalizability Theory with the Inter-Rater Agreement Coefficients

Peer reviewed
PDF on ERIC

Download full text

Eser, Mehmet Taha; Aksu, Gökhan – International Journal of Curriculum and Instruction, 2022

The agreement between raters is examined within the scope of the concept of "inter-rater reliability". Although there are clear definitions of the concepts of agreement between raters and reliability between raters, there is no clear information about the conditions under which agreement and reliability level methods are appropriate to…

Descriptors: Generalizability Theory, Interrater Reliability, Evaluation Methods, Test Theory

Programme Evaluation in Action: Theory to Practice from an Asian Educational Context

Peer reviewed

Direct link

Ser Ming Mark Lee; Wei Cheng Liu – Asia Pacific Journal of Education, 2024

Programme evaluation has developed tremendously over the past 50 years, with a proliferation of evaluation research, an increase in the institutionalization of evaluation, and growth in the professionalization of evaluation. However, existing research and developments are still largely in North America, Europe, Australia, and New Zealand, with…

Descriptors: Foreign Countries, Evaluation Research, Evaluation Methods, Evaluation Criteria

Test Assembly Implications for Providing Reliable and Valid Subscores

Peer reviewed

Direct link

Lee, Minji K.; Sweeney, Kevin; Melican, Gerald J. – Educational Assessment, 2017

This study investigates the relationships among factor correlations, inter-item correlations, and the reliability estimates of subscores, providing a guideline with respect to psychometric properties of useful subscores. In addition, it compares subscore estimation methods with respect to reliability and distinctness. The subscore estimation…

Descriptors: Scores, Test Construction, Test Reliability, Test Validity

Improving Comprehension Assessment for Middle and High School Students: Challenges and Opportunities

Peer reviewed
PDF on ERIC

Download full text

Sabatini, John; Petscher, Yaacov; O'Reilly, Tenaha; Truckenmiller, Adrea – Grantee Submission, 2015

For decades, standardized reading comprehension tests have consisted of a series of passages and associated multiple-choice questions. Although widely used in and out of the classroom, there continues to be considerable disagreement regarding how or whether such tests have net value in the service of advancing educational progress in reading. This…

Descriptors: Middle School Students, High School Students, Reading Comprehension, Reading Tests

A Psychometric Analysis of the Chemical Concepts Inventory

Peer reviewed

Direct link

Barbera, Jack – Journal of Chemical Education, 2013

The Chemical Concepts Inventory (CCI) is a multiple-choice instrument designed to assess the alternate conceptions of students in high school or first-semester college chemistry. The instrument was published in 2002 along with an analysis of its data from a test population. This study supports the initial analysis and expands on the psychometric…

Descriptors: Science Instruction, Secondary School Science, High Schools, College Science

Making Do with What We Have: Use Your Bootstraps

Peer reviewed

Direct link

Calmettes, Guillaume; Drummond, Gordon B.; Vowler, Sarah L. – Advances in Physiology Education, 2012

A jack knife is a pocket knife that is put to many tasks, because it's ready to hand. Often there could be a better tool for the job, such as a screwdriver, a scraper, or a can-opener, but these are not usually pocket items. In statistical terms, the expression implies making do with what's available. Another simile, of an extreme situation, is…

Descriptors: Statistical Analysis, Computation, Population Distribution, Evaluation Methods

The Number of Feedbacks Needed for Reliable Evaluation. A Multilevel Analysis of the Reliability, Stability and Generalisability of Students' Evaluation of Teaching

Peer reviewed

Direct link

Rantanen, Pekka – Assessment & Evaluation in Higher Education, 2013

A multilevel analysis approach was used to analyse students' evaluation of teaching (SET). The low value of inter-rater reliability stresses that any solid conclusions on teaching cannot be made on the basis of single feedbacks. To assess a teacher's general teaching effectiveness, one needs to evaluate four randomly chosen course implementations.…

Descriptors: Test Reliability, Feedback (Response), Generalizability Theory, Student Evaluation of Teacher Performance

Design, Development and Validation of a Model of Problem Solving for Egyptian Science Classes

Peer reviewed

Direct link

Shahat, Mohamed A.; Ohle, Annika; Treagust, David F.; Fischer, Hans E. – International Journal of Science and Mathematics Education, 2013

Educators and policymakers envision the future of education in Egypt as enabling learners to acquire scientific inquiry and problem-solving skills. In this article, we describe the validation of a model for problem solving and the design of instruments for evaluating new teaching methods in Egyptian science classes. The instruments were based on…

Descriptors: Foreign Countries, Questionnaires, Problem Solving, Science Instruction

A "Conditional" Sense of Fairness in Assessment

Peer reviewed

Direct link

Mislevy, Robert J.; Haertel, Geneva; Cheng, Britte H.; Ructtinger, Liliana; DeBarger, Angela; Murray, Elizabeth; Rose, David; Gravel, Jenna; Colker, Alexis M.; Rutstein, Daisy; Vendlinski, Terry – Educational Research and Evaluation, 2013

Standardizing aspects of assessments has long been recognized as a tactic to help make evaluations of examinees fair. It reduces variation in irrelevant aspects of testing procedures that could advantage some examinees and disadvantage others. However, recent attention to making assessment accessible to a more diverse population of students…

Descriptors: Testing Accommodations, Access to Education, Testing, Psychometrics

The Reliability of Results from National Tests, Public Examinations, and Vocational Qualifications in England

Peer reviewed

Direct link

He, Qingping; Opposs, Dennis – Educational Research and Evaluation, 2012

National tests, public examinations, and vocational qualifications in England are used for a variety of purposes, including the certification of individual learners in different subject areas and the accountability of individual professionals and institutions. However, there has been ongoing debate about the reliability and validity of their…

Descriptors: Qualifications, Evidence, National Competency Tests, Foreign Countries

A Psychometric Study of the Infant and Toddler Intervals of the Social Emotional Assessment Measure

Peer reviewed

Direct link

Squires, Jane K.; Waddell, Misti L.; Clifford, Jantina R.; Funk, Kristin; Hoselton, Robert M.; Chen, Ching-I – Topics in Early Childhood Special Education, 2013

Psychometric and utility studies on Social Emotional Assessment Measure (SEAM), an innovative tool for assessing and monitoring social-emotional and behavioral development in infants and toddlers with disabilities, were conducted. The Infant and Toddler SEAM intervals were the study focus, using mixed methods, including item response theory…

Descriptors: Psychometrics, Evaluation Methods, Social Development, Emotional Development

Evaluating Alignment between Curriculum, Assessment, and Instruction

Peer reviewed

Direct link

Martone, Andrea; Sireci, Stephen G. – Review of Educational Research, 2009

The authors (a) discuss the importance of alignment for facilitating proper assessment and instruction, (b) describe the three most common methods for evaluating the alignment between state content standards and assessments, (c) discuss the relative strengths and limitations of these methods, and (d) discuss examples of applications of each…

Descriptors: Teaching Methods, Alignment (Education), Student Evaluation, Curriculum Development

The Development of a Digital Logic Concept Inventory

Direct link

Herman, Geoffrey Lindsay – ProQuest LLC, 2011

Instructors in electrical and computer engineering and in computer science have developed innovative methods to teach digital logic circuits. These methods attempt to increase student learning, satisfaction, and retention. Although there are readily accessible and accepted means for measuring satisfaction and retention, there are no widely…

Descriptors: Grounded Theory, Delphi Technique, Concept Formation, Misconceptions

Two Prophecy Formulas for Assessing the Reliability of Item Response Theory-Based Ability Estimates

Peer reviewed

Direct link

Raju, Nambury S.; Oshima, T.C. – Educational and Psychological Measurement, 2005

Two new prophecy formulas for estimating item response theory (IRT)-based reliability of a shortened or lengthened test are proposed. Some of the relationships between the two formulas, one of which is identical to the well-known Spearman-Brown prophecy formula, are examined and illustrated. The major assumptions underlying these formulas are…

Descriptors: Item Response Theory, Test Reliability, Evaluation Methods, Computation

On the Virtues and Vices of the Standard Error of Measurement.

Peer reviewed

Williams, Richard H.; Zimmerman, Donald W. – Journal of Experimental Education, 1984

This paper provides a list of 10 salient features of the standard error of measurement, contrasting it to the reliability coefficient. It is concluded that the standard error of measurement should be regarded as a primary characteristic of a mental test. (Author/DWH)

Descriptors: Educational Testing, Error of Measurement, Evaluation Methods, Psychological Testing

Previous Page | Next Page »

Pages: 1 | 2

Alberta Journal of…	2
Educational Research and…	2
Educational and Psychological…	2
Advances in Physiology…	1
Annual Review of Applied…	1
Applied Psychological…	1
Asia Pacific Journal of…	1
Assessment & Evaluation in…	1
Educational Assessment	1
Grantee Submission	1
International Journal of…	1
International Journal of…	1
Journal of Chemical Education	1
Journal of Experimental…	1
ProQuest LLC	1
Research Quarterly	1
Research Quarterly for…	1
Review of Educational Research	1
Topics in Early Childhood…	1
More ▼

Aksu, Gökhan	1
Bachor, Dan G.	1
Barbera, Jack	1
Cahan, Sorel	1
Calmettes, Guillaume	1
Cason, Gerald J.	1
Chen, Ching-I	1
Cheng, Britte H.	1
Clifford, Jantina R.	1
Cohen, Allan S., Comp.	1
Colker, Alexis M.	1
DeBarger, Angela	1
Douglas, Dan	1
Drummond, Gordon B.	1
Eser, Mehmet Taha	1
Feldt, Leonard S.	1
Fischer, Hans E.	1
Funk, Kristin	1
Gravel, Jenna	1
Haertel, Geneva	1
He, Qingping	1
Herman, Geoffrey Lindsay	1
Hoselton, Robert M.	1
Lee, Minji K.	1
More ▼