ERIC - Search Results

Publication Date

In 2026	0
Since 2025	1
Since 2022 (last 5 years)	6
Since 2017 (last 10 years)	35
Since 2007 (last 20 years)	80

Descriptor

Interrater Reliability	136
Test Items	136
Scoring	42
Test Construction	36
Difficulty Level	35
Test Reliability	35
Foreign Countries	32
Scores	25
Test Validity	25
Evaluators	21
Correlation	19
Item Response Theory	19
Standard Setting (Scoring)	18
Evaluation Methods	17
Mathematics Tests	17
Comparative Analysis	16
Language Tests	16
Item Analysis	15
Psychometrics	15
Standards	15
Cutting Scores	14
Testing	14
Judges	13
Student Evaluation	13
Statistical Analysis	12
More ▼

Publication Type

Reports - Research	89
Journal Articles	79
Speeches/Meeting Papers	33
Reports - Evaluative	26
Reports - Descriptive	12
Numerical/Quantitative Data	7
Tests/Questionnaires	5
Information Analyses	4
Dissertations/Theses -…	2
Books	1
Collected Works - General	1
Collected Works - Serials	1
Guides - General	1
Guides - Non-Classroom	1
Opinion Papers	1
More ▼

Education Level

Elementary Education	15
Higher Education	15
Postsecondary Education	10
Secondary Education	10
Elementary Secondary Education	9
Grade 8	5
High Schools	4
Middle Schools	4
Early Childhood Education	3
Grade 2	3
Grade 4	3
Grade 6	3
Grade 9	3
Kindergarten	3
Grade 3	2
Intermediate Grades	2
Junior High Schools	2
Primary Education	2
Grade 1	1
Grade 5	1
Preschool Education	1
More ▼

Audience

Researchers	4
Practitioners	1
Teachers	1

Location

California	3
Florida	3
Netherlands	3
Pennsylvania	3
Taiwan	3
Turkey	3
United States	3
Australia	2
Canada	2
Germany	2
Japan	2
New Jersey	2
New Mexico	2
South Africa	2
Tennessee	2
United Kingdom	2
China	1
Colorado	1
France	1
Georgia	1
India	1
Indonesia	1
Jordan	1
Malaysia	1
Michigan	1
More ▼

Laws, Policies, & Programs

Assessments and Surveys

Test of English as a Foreign…	4
Program for International…	3
National Assessment of…	2
SAT (College Admission Test)	2
Trends in International…	2
ACT Assessment	1
Adult Attachment Interview	1
Clinical Evaluation of…	1
Dynamic Indicators of Basic…	1
Early Childhood Environment…	1
Motivated Strategies for…	1
Raven Progressive Matrices	1
Strengths and Difficulties…	1
TerraNova Multiple Assessments	1
Test of English for…	1
More ▼

What Works Clearinghouse Rating

Showing 1 to 15 of 136 results Save | Export

Automated Scoring in Learning Progression-Based Assessment: A Comparison of Researcher and Machine Interpretations

Peer reviewed

Direct link

Hui Jin; Cynthia Lima; Limin Wang – Educational Measurement: Issues and Practice, 2025

Although AI transformer models have demonstrated notable capability in automated scoring, it is difficult to examine how and why these models fall short in scoring some responses. This study investigated how transformer models' language processing and quantification processes can be leveraged to enhance the accuracy of automated scoring. Automated…

Descriptors: Automation, Scoring, Artificial Intelligence, Accuracy

Establishing a Physics Concept Inventory Using Computer Marked Free-Response Questions

Peer reviewed
PDF on ERIC

Download full text

Parker, Mark A. J.; Hedgeland, Holly; Jordan, Sally E.; Braithwaite, Nicholas St. J. – European Journal of Science and Mathematics Education, 2023

The study covers the development and testing of the alternative mechanics survey (AMS), a modified force concept inventory (FCI), which used automatically marked free-response questions. Data were collected over a period of three academic years from 611 participants who were taking physics classes at high school and university level. A total of…

Descriptors: Test Construction, Scientific Concepts, Physics, Test Reliability

The Use of Open-Ended Questions in Large-Scale Tests for Selection: Generalizability and Dependability

Peer reviewed
PDF on ERIC

Download full text

Atilgan, Hakan; Demir, Elif Kübra; Ogretmen, Tuncay; Basokcu, Tahsin Oguz – International Journal of Progressive Education, 2020

It has become a critical question what the reliability level would be when open-ended questions are used in large-scale selection tests. One of the aims of the present study is to determine what the reliability would be in the event that the answers given by test-takers are scored by experts when open-ended short answer questions are used in…

Descriptors: Foreign Countries, Secondary School Students, Test Items, Test Reliability

The Development of a Test to Explore the Students' Mental Models and External Representation Patterns of Hanging Objects

Peer reviewed
PDF on ERIC

Download full text

Kaharu, Sarintan N.; Mansyur, Jusman – Pegem Journal of Education and Instruction, 2021

This study aims to develop a test that can be used to explore mental models and representation patterns of objects in liquid fluid. The test developed by adapting the Reeves's Development Model was carried out in several stages, namely: determining the orientation and test segments; initial survey; preparation of the initial draft; try out;…

Descriptors: Test Construction, Schemata (Cognition), Scientific Concepts, Water

Development of a Comprehensive Tool for School Health Policy Evaluation: The WellSAT WSCC

Peer reviewed

Direct link

Koriakin, Taylor A.; McKee, Sarah L.; Schwartz, Marlene B.; Chafouleas, Sandra M. – Journal of School Health, 2020

Background: Stakeholders increasingly recognize the role of policy in implementing Whole School, Whole Community, Whole Child (WSCC) frameworks in schools; however, few tools are currently available to assess alignment between district policies and WSCC concepts. The purpose of this study was to expand the Wellness School Assessment Tool (WellSAT)…

Descriptors: School Policy, Health Services, Health Promotion, Wellness

To What Extent Are Item Discrimination Values Realistic? A New Index for Two-Dimensional Structures

Peer reviewed
PDF on ERIC

Download full text

Kilic, Abdullah Faruk; Uysal, Ibrahim – International Journal of Assessment Tools in Education, 2022

Most researchers investigate the corrected item-total correlation of items when analyzing item discrimination in multi-dimensional structures under the Classical Test Theory, which might lead to underestimating item discrimination, thereby removing items from the test. Researchers might investigate the corrected item-total correlation with the…

Descriptors: Item Analysis, Correlation, Item Response Theory, Test Items

Development and Validation of a Survey Instrument for Measuring Pre-Service Teachers' Pedagogical Content Knowledge

Peer reviewed

Direct link

Martin, David; Jamieson-Proctor, Romina – International Journal of Research & Method in Education, 2020

In Australia, one of the key findings of the Teacher Education Ministerial Advisory Group was that not all graduating pre-service teachers possess adequate pedagogical content knowledge (PCK) to teach effectively. The concern is that higher education providers working with pre-service teachers are using pedagogical practices and assessments which…

Descriptors: Test Construction, Preservice Teachers, Pedagogical Content Knowledge, Foreign Countries

Evaluating Human Scoring Using Generalizability Theory

Peer reviewed

Direct link

Bimpeh, Yaw; Pointer, William; Smith, Ben Alexander; Harrison, Liz – Applied Measurement in Education, 2020

Many high-stakes examinations in the United Kingdom (UK) use both constructed-response items and selected-response items. We need to evaluate the inter-rater reliability for constructed-response items that are scored by humans. While there are a variety of methods for evaluating rater consistency across ratings in the psychometric literature, we…

Descriptors: Scoring, Generalizability Theory, Interrater Reliability, Foreign Countries

Beyond Percent Correct: Measuring Change in Individual Picture Naming Ability

Peer reviewed

Direct link

Walker, Grant M.; Basilakos, Alexandra; Fridriksson, Julius; Hickok, Gregory – Journal of Speech, Language, and Hearing Research, 2022

Purpose: Meaningful changes in picture naming responses may be obscured when measuring accuracy instead of quality. A statistic that incorporates information about the severity and nature of impairments may be more sensitive to the effects of treatment. Method: We analyzed data from repeated administrations of a naming test to 72 participants with…

Descriptors: Naming, Change, Aphasia, Severity (of Disability)

Inter-Rater Agreement in Assigning Cognitive Demand to Life Sciences Examination Questions

Peer reviewed

Direct link

Dempster, Edith R.; Kirby, Nicola F. – Perspectives in Education, 2018

Taxonomies of cognitive demand are frequently used to ensure that assessment tasks include questions ranging from low to high cognitive demand. This paper investigates inter-rater agreement among four evaluators on the cognitive demand of the South African National Senior Certificate Life Sciences examinations after training, practice and…

Descriptors: Interrater Reliability, Biological Sciences, Cognitive Processes, Test Items

Detecting Rater Effects in Trend Scoring

Direct link

Abdalla, Widad – ProQuest LLC, 2019

Trend scoring is often used in large-scale assessments to monitor for rater drift when the same constructed response items are administered in multiple test administrations. In trend scoring, a set of responses from Time "A" are rescored by raters at Time "B." The purpose of this study is to examine the ability of…

Descriptors: Scoring, Interrater Reliability, Test Items, Error Patterns

Inter-Rater Agreement in Assigning Levels of Difficulty to Examination Questions in Life Sciences

Peer reviewed
PDF on ERIC

Download full text

Dempster, Edith R.; Kirby, Nicki F. – South African Journal of Education, 2018

Public perception of "declining standards" in school-leaving examinations often accompanies increases in pass rates in schoolleaving examinations. "Declining standards" to the public means easier examination papers. The present study evaluates a South African attempt to estimate the level of difficulty, as distinct from…

Descriptors: Foreign Countries, Interrater Reliability, Difficulty Level, Science Tests

Accounting for Rater Effects with the Hierarchical Rater Model Framework When Scoring Simple Structured Constructed Response Tests

Peer reviewed

Direct link

Nieto, Ricardo; Casabianca, Jodi M. – Journal of Educational Measurement, 2019

Many large-scale assessments are designed to yield two or more scores for an individual by administering multiple sections measuring different but related skills. Multidimensional tests, or more specifically, simple structured tests, such as these rely on multiple multiple-choice and/or constructed responses sections of items to generate multiple…

Descriptors: Tests, Scoring, Responses, Test Items

Can Survey Item Characteristics Relevant to Measurement Error Be Coded Reliably? A Case Study on 11 Dutch General Population Surveys

Peer reviewed

Direct link

Bais, Frank; Schouten, Barry; Lugtig, Peter; Toepoel, Vera; Arends-Tòth, Judit; Douhou, Salima; Kieruj, Natalia; Morren, Mattijn; Vis, Corrie – Sociological Methods & Research, 2019

Item characteristics can have a significant effect on survey data quality and may be associated with measurement error. Literature on data quality and measurement error is often inconclusive. This could be because item characteristics used for detecting measurement error are not coded unambiguously. In our study, we use a systematic coding…

Descriptors: Foreign Countries, National Surveys, Error of Measurement, Test Items

Does Comparative Judgement of Scripts Provide an Effective Means of Maintaining Standards in Mathematics? Research Report

Download full text

Benton, Tom; Leech, Tony; Hughes, Sarah – Cambridge Assessment, 2020

In the context of examinations, the phrase "maintaining standards" usually refers to any activity designed to ensure that it is no easier (or harder) to achieve a given grade in one year than in another. Specifically, it tends to mean activities associated with setting examination grade boundaries. Benton et al (2020) describes a method…

Descriptors: Mathematics Tests, Equated Scores, Comparative Analysis, Difficulty Level

Previous Page | Next Page »

Pages: 1 | 2 | 3 | 4 | 5 | 6 | 7 | 8 | 9 | 10

Applied Measurement in…	5
ETS Research Report Series	5
Educational Measurement:…	5
Online Submission	5
Educational and Psychological…	3
International Journal of…	3
Journal of Speech, Language,…	3
Educational Assessment	2
Journal of Educational…	2
Language Assessment Quarterly	2
Measurement in Physical…	2
National Center for Research…	2
New Mexico Public Education…	2
Assessment	1
Assessment for Effective…	1
Assessment in Education:…	1
Autism: The International…	1
Behavioral Research and…	1
Cambridge Assessment	1
Center for Research on…	1
Computer Science Education	1
Current Issues in Education	1
Education Digest: Essential…	1
Educational Psychology	1
Educational Sciences: Theory…	1
More ▼

Chang, Lei	3
Alonzo, Julie	2
Anderson, Daniel	2
Avery, Marybell	2
Carifio, James	2
Dempster, Edith R.	2
Dyson, Ben	2
Fisette, Jennifer L.	2
Fox, Connie	2
Franck, Marian	2
Friedman, Greg	2
Graber, Kim C.	2
Herman, Joan L.	2
Lunz, Mary E.	2
McGinty, Dixie	2
Michaels, Hillary	2
Neel, John H.	2
Newman, Larry S.	2
O'Neill, Thomas R.	2
Ochieng, Charles	2
Park, Youngsik	2
Placek, Judith H.	2
Raynes, De	2
Rink, Judy	2
More ▼