Publication Date
In 2025 | 0 |
Since 2024 | 1 |
Since 2021 (last 5 years) | 5 |
Since 2016 (last 10 years) | 7 |
Since 2006 (last 20 years) | 46 |
Descriptor
Evaluation Methods | 106 |
Test Theory | 106 |
Student Evaluation | 29 |
Psychometrics | 27 |
Test Reliability | 27 |
Foreign Countries | 23 |
Test Validity | 23 |
Measurement Techniques | 20 |
Testing | 19 |
Comparative Analysis | 15 |
Item Response Theory | 15 |
More ▼ |
Source
Author
Publication Type
Education Level
Location
United Kingdom | 5 |
United Kingdom (England) | 5 |
United Kingdom (Wales) | 4 |
United States | 4 |
Canada | 3 |
Australia | 2 |
Netherlands | 2 |
Sweden | 2 |
Turkey | 2 |
United Kingdom (Northern… | 2 |
Chile | 1 |
More ▼ |
Laws, Policies, & Programs
Elementary and Secondary… | 2 |
Assessments and Surveys
What Works Clearinghouse Rating
Comparison of the Results of the Generalizability Theory with the Inter-Rater Agreement Coefficients
Eser, Mehmet Taha; Aksu, Gökhan – International Journal of Curriculum and Instruction, 2022
The agreement between raters is examined within the scope of the concept of "inter-rater reliability". Although there are clear definitions of the concepts of agreement between raters and reliability between raters, there is no clear information about the conditions under which agreement and reliability level methods are appropriate to…
Descriptors: Generalizability Theory, Interrater Reliability, Evaluation Methods, Test Theory
Richards, Adam S. – Communication Teacher, 2021
Course: Communication Research Methods. Objectives: This activity provides students with an experiential introduction to measurement theory and the methods for assessing measurement reliability. First, multiple measurements of a person's height are interpreted according to classical test theory. Second, the measurement of human height is used as…
Descriptors: Body Height, Measurement, Communication Research, Test Theory
Ser Ming Mark Lee; Wei Cheng Liu – Asia Pacific Journal of Education, 2024
Programme evaluation has developed tremendously over the past 50 years, with a proliferation of evaluation research, an increase in the institutionalization of evaluation, and growth in the professionalization of evaluation. However, existing research and developments are still largely in North America, Europe, Australia, and New Zealand, with…
Descriptors: Foreign Countries, Evaluation Research, Evaluation Methods, Evaluation Criteria
Rainey, Katherine D.; Vignal, Michael; Wilcox, Bethany R. – Physical Review Physics Education Research, 2022
Currently there are no assessment instruments available for upper-division thermal physics, though several introductory assessments are currently available. Notably missing from these introductory assessment are items targeting statistical mechanics. This leaves a gap in the content that can be assessed by upper-division thermal physics faculty.…
Descriptors: Physics, Science Instruction, Thermodynamics, College Science
Kaya Uyanik, Gulden; Demirtas Tolaman, Tugba; Gur Erdogan, Duygu – International Journal of Assessment Tools in Education, 2021
This paper aims to examine and assess the questions included in the "Turkish Common Exam" for sixth graders held in the first semester of 2018 which is one of the common exams carried out by The Measurement and Evaluation Centers, in terms of question structure, quality and taxonomic value. To this end, the test questions were examined…
Descriptors: Foreign Countries, Grade 6, Standardized Tests, Test Items
Scharaschkin, Alex – Assessment in Education: Principles, Policy & Practice, 2017
This issue's featured article, "Assessment and Learning: Fields Apart" (Baird, Andrich, Hopfenbeck, and Stobart 2017) raises issues that are of basic importance for the disciplines of assessment and teaching and learning theory. In this commentary, Alex Scharaschkin restricts his remarks to a few areas. He considers the idea of a…
Descriptors: Educational Assessment, Learning Theories, Test Theory, Psychometrics
Lee, Minji K.; Sweeney, Kevin; Melican, Gerald J. – Educational Assessment, 2017
This study investigates the relationships among factor correlations, inter-item correlations, and the reliability estimates of subscores, providing a guideline with respect to psychometric properties of useful subscores. In addition, it compares subscore estimation methods with respect to reliability and distinctness. The subscore estimation…
Descriptors: Scores, Test Construction, Test Reliability, Test Validity
Reimann, Peter; Kickmeier-Rust, Michael; Albert, Dietrich – Computers & Education, 2013
This paper explores the relation between problem solving learning environments (PSLEs) and assessment concepts. The general framework of evidence-centered assessment design is used to describe PSLEs in terms of assessment concepts, and to identify similarities between the process of assessment design and of PSLE design. We use a recently developed…
Descriptors: Teaching Methods, Psychometrics, Problem Solving, Test Theory
Improving Comprehension Assessment for Middle and High School Students: Challenges and Opportunities
Sabatini, John; Petscher, Yaacov; O'Reilly, Tenaha; Truckenmiller, Adrea – Grantee Submission, 2015
For decades, standardized reading comprehension tests have consisted of a series of passages and associated multiple-choice questions. Although widely used in and out of the classroom, there continues to be considerable disagreement regarding how or whether such tests have net value in the service of advancing educational progress in reading. This…
Descriptors: Middle School Students, High School Students, Reading Comprehension, Reading Tests
Maydeu-Olivares, Alberto – Measurement: Interdisciplinary Research and Perspectives, 2013
In this rejoinder, Maydeu-Olivares states that, in item response theory (IRT) measurement applications, the application of goodness-of-fit (GOF) methods informs researchers of the discrepancy between the model and the data being fitted (the room for improvement). By routinely reporting the GOF of IRT models, together with the substantive results…
Descriptors: Goodness of Fit, Models, Evaluation Methods, Item Response Theory
Lambert, Matthew C.; Hurley, Kristin Duppong; Tomlinson, M. Michele Athay; Stevens, Amy L. – Child & Youth Care Forum, 2013
Background: A client's motivation to receive services is significantly related to seeking services, remaining in services, and improved outcomes. The Motivation for Youth Treatment Scale (MYTS) is one of the few brief measures used to assess motivation for mental health treatment. Objective: To investigate if the psychometric properties of the…
Descriptors: Motivation, Mental Health, Health Services, Access to Health Care
Herman, Geoffrey L.; Zilles, Craig; Loui, Michael C. – Computer Science Education, 2014
Concept inventories hold tremendous promise for promoting the rigorous evaluation of teaching methods that might remedy common student misconceptions and promote deep learning. The measurements from concept inventories can be trusted only if the concept inventories are evaluated both by expert feedback and statistical scrutiny (psychometric…
Descriptors: Psychometrics, Concept Formation, Measures (Individuals), Teaching Methods
Royal, Kenneth D.; Gilliland, Kurt O.; Kernick, Edward T. – Anatomical Sciences Education, 2014
Any examination that involves moderate to high stakes implications for examinees should be psychometrically sound and legally defensible. Currently, there are two broad and competing families of test theories that are used to score examination data. The majority of instructors outside the high-stakes testing arena rely on classical test theory…
Descriptors: Item Response Theory, Scoring, Evaluation Methods, Anatomy
Barbera, Jack – Journal of Chemical Education, 2013
The Chemical Concepts Inventory (CCI) is a multiple-choice instrument
designed to assess the alternate conceptions of students in high school or first-semester college chemistry. The instrument was published in 2002 along with an analysis of its data from a test population. This study supports the initial analysis and expands on the psychometric…
Descriptors: Science Instruction, Secondary School Science, High Schools, College Science
Xu, Ting; Stone, Clement A. – Educational and Psychological Measurement, 2012
It has been argued that item response theory trait estimates should be used in analyses rather than number right (NR) or summated scale (SS) scores. Thissen and Orlando postulated that IRT scaling tends to produce trait estimates that are linearly related to the underlying trait being measured. Therefore, IRT trait estimates can be more useful…
Descriptors: Educational Research, Monte Carlo Methods, Measures (Individuals), Item Response Theory