ERIC - Search Results

Publication Date

In 2025	0
Since 2024	1
Since 2021 (last 5 years)	2
Since 2016 (last 10 years)	9
Since 2006 (last 20 years)	13

Descriptor

Error of Measurement	17
Foreign Countries	17
Interrater Reliability	17
Correlation	4
Item Response Theory	4
Rating Scales	4
Test Reliability	4
Data Analysis	3
English	3
Generalizability Theory	3
Language Tests	3
Statistical Analysis	3
Validity	3
Writing Evaluation	3
Coding	2
College Faculty	2
Comparative Analysis	2
Focus Groups	2
Middle School Students	2
Performance Based Assessment	2
Psychometrics	2
Scores	2
Scoring	2
Standard Setting (Scoring)	2
Test Items	2
More ▼

Source

Language Assessment Quarterly	2
Language Testing	2
Measurement in Physical…	2
Alberta Journal of…	1
Creativity Research Journal	1
Developmental Psychology	1
Educational Sciences: Theory…	1
Educational and Psychological…	1
Participatory Educational…	1
Research Papers in Education	1
Sociological Methods &…	1
Studies in Higher Education	1
World Englishes	1
More ▼

Publication Type

Journal Articles	16
Reports - Research	14
Opinion Papers	1
Reports - Descriptive	1
Reports - Evaluative	1
Speeches/Meeting Papers	1
Tests/Questionnaires	1

Education Level

Higher Education	4
Postsecondary Education	4
Elementary Education	3
Elementary Secondary Education	1
Grade 6	1
Grade 7	1
Intermediate Grades	1
Junior High Schools	1
Middle Schools	1
Secondary Education	1

Audience

Location

Canada	2
Netherlands	2
Turkey	2
United Kingdom	2
Canada (Toronto)	1
China (Beijing)	1
Finland	1
Japan	1
Netherlands (Amsterdam)	1
Taiwan	1
Taiwan (Taipei)	1
United Kingdom (England)	1
More ▼

Laws, Policies, & Programs

Assessments and Surveys

Trends in International…

What Works Clearinghouse Rating

Showing 1 to 15 of 17 results Save | Export

All Types of Experience Are Equal, but Some Are More Equal: The Effect of Different Types of Experience on Rater Severity and Rater Consistency

Peer reviewed

Direct link

Reeta Neittaanmäki; Iasonas Lamprianou – Language Testing, 2024

This article focuses on rater severity and consistency and their relation to different types of rater experience over a long period of time. The article is based on longitudinal data collected from 2009 to 2019 from the second language Finnish speaking subtest in the National Certificates of Language Proficiency in Finland. The study investigated…

Descriptors: Foreign Countries, Interrater Reliability, Error of Measurement, Experience

Can Survey Item Characteristics Relevant to Measurement Error Be Coded Reliably? A Case Study on 11 Dutch General Population Surveys

Peer reviewed

Direct link

Bais, Frank; Schouten, Barry; Lugtig, Peter; Toepoel, Vera; Arends-Tòth, Judit; Douhou, Salima; Kieruj, Natalia; Morren, Mattijn; Vis, Corrie – Sociological Methods & Research, 2019

Item characteristics can have a significant effect on survey data quality and may be associated with measurement error. Literature on data quality and measurement error is often inconclusive. This could be because item characteristics used for detecting measurement error are not coded unambiguously. In our study, we use a systematic coding…

Descriptors: Foreign Countries, National Surveys, Error of Measurement, Test Items

Examining the Differential Rater Functioning in the Process of Assessing Writing Skills of Middle School 7th Grade Students

Peer reviewed
PDF on ERIC

Download full text

Erman Aslanoglu, Aslihan; Sata, Mehmet – Participatory Educational Research, 2021

When students present writing tasks that require higher order thinking skills to work, one of the most important problems is scoring these writing tasks objectively. The fact that raters give scores below or above their performance based on several environmental factors affects the consistency of the measurements. Inconsistencies in scoring…

Descriptors: Interrater Reliability, Evaluators, Error of Measurement, Writing Evaluation

Intra- and Inter-Rater Reliability of the Rate of Force Development of Hip Abductor Muscles Measured by Hand-Held Dynamometer

Peer reviewed

Direct link

Takeda, Kazuya; Tanabe, Shigeo; Koyama, Soichiro; Nagai, Tomoko; Sakurai, Hiroaki; Kanada, Yoshikiyo; Shomoto, Koji – Measurement in Physical Education and Exercise Science, 2018

The aim of this study was to clarify the intra- and inter-rater reliability of the rate of force development in hip abductor muscle force measurements using a hand-held dynamometer. Thirty healthy adults were separately assessed by two independent raters on two separate days. Rate of force development was calculated from the slope of the…

Descriptors: Interrater Reliability, Human Body, Measurement Equipment, Handheld Devices

Inter-Rater and Test-Retest (Between-Sessions) Reliability of the 4-Skills Scan for Dutch Elementary School Children

Peer reviewed

Direct link

van Kernebeek, Willem G.; de Schipper, Antoine W.; Savelsbergh, Geert J. P.; Toussaint, Huub M. – Measurement in Physical Education and Exercise Science, 2018

In The Netherlands, the 4-Skills Scan is an instrument for physical education teachers to assess gross motor skills of elementary school children. Little is known about its reliability. Therefore, in this study the test-retest and inter-rater reliability was determined. Respectively, 624 and 557 Dutch 6- to 12-year-old children were analyzed for…

Descriptors: Foreign Countries, Interrater Reliability, Pretests Posttests, Psychomotor Skills

Judging Research Papers for Research Excellence

Peer reviewed

Direct link

Tymms, Peter; Higgins, Steve – Studies in Higher Education, 2018

The United Kingdom's (UK's) Research Excellence Framework of 2014 was an expensive high stakes evaluation which had a range of impacts on higher education institutions across the country. One component was an assessment of the quality of research outputs where a major feature was a series of panels organised to read and rate the outputs of their…

Descriptors: Research Reports, Educational Research, Journal Articles, Teacher Researchers

Investigation of Coefficient of Individual Agreement in Terms of Sample Size, Random and Monotone Missing Ratio, and Number of Repeated Measures

Peer reviewed
PDF on ERIC

Download full text

Temel, Gülhan Orekici; Erdogan, Semra; Selvi, Hüseyin; Kaya, Irem Ersöz – Educational Sciences: Theory and Practice, 2016

Studies based on longitudinal data focus on the change and development of the situation being investigated and allow for examining cases regarding education, individual development, cultural change, and socioeconomic improvement in time. However, as these studies require taking repeated measures in different time periods, they may include various…

Descriptors: Investigations, Sample Size, Longitudinal Studies, Interrater Reliability

Investigating Score Dependability in English/Chinese Interpreter Certification Performance Testing: A Generalizability Theory Approach

Peer reviewed

Direct link

Han, Chao – Language Assessment Quarterly, 2016

As a property of test scores, reliability/dependability constitutes an important psychometric consideration, and it underpins the validity of measurement results. A review of interpreter certification performance tests (ICPTs) reveals that (a) although reliability/dependability checking has been recognized as an important concern, its theoretical…

Descriptors: Foreign Countries, Scores, English, Chinese

Determining the Scoring Validity of a Co-Constructed CEFR-Based Rating Scale

Peer reviewed

Direct link

Deygers, Bart; Van Gorp, Koen – Language Testing, 2015

Considering scoring validity as encompassing both reliable rating scale use and valid descriptor interpretation, this study reports on the validation of a CEFR-based scale that was co-constructed and used by novice raters. The research questions this paper wishes to answer are (a) whether it is possible to construct a CEFR-based rating scale with…

Descriptors: Rating Scales, Scoring, Validity, Interrater Reliability

Observed Sensitivity during Family Interactions and Cumulative Risk: A Study of Multiple Dyads per Family

Peer reviewed

Direct link

Browne, Dillon T.; Leckie, George; Prime, Heather; Perlman, Michal; Jenkins, Jennifer M. – Developmental Psychology, 2016

The present study sought to investigate the family, individual, and dyad-specific contributions to observed cognitive sensitivity during family interactions. Moreover, the influence of cumulative risk on sensitivity at the aforementioned levels of the family was examined. Mothers and 2 children per family were observed interacting in a round robin…

Descriptors: Family Relationship, Family (Sociological Unit), Sibling Relationship, Siblings

Comparing Yes/No Angoff and Bookmark Standard Setting Methods in the Context of English Assessment

Peer reviewed

Direct link

Hsieh, Mingchuan – Language Assessment Quarterly, 2013

The Yes/No Angoff and Bookmark method for setting standards on educational assessment are currently two of the most popular standard-setting methods. However, there is no research into the comparability of these two methods in the context of language assessment. This study compared results from the Yes/No Angoff and Bookmark methods as applied to…

Descriptors: Standard Setting (Scoring), Comparative Analysis, Language Tests, Multiple Choice Tests

Qualification Users' Perceptions and Experiences of Assessment Reliability

Peer reviewed

Direct link

Chamberlain, Suzanne – Research Papers in Education, 2013

This paper presents the findings of a study designed to explore qualification users' perceptions and experiences of reliability in the context of national assessment outcomes in England. The study consisted of 17 focus groups conducted across six sectors of qualification users: students, teachers, trainee teachers, job-seekers, employers and…

Descriptors: Qualifications, Test Reliability, Foreign Countries, Focus Groups

Improving Creativity Performance Assessment: A Rater Effect Examination with Many Facet Rasch Model

Peer reviewed

Direct link

Hung, Su-Pin; Chen, Po-Hsi; Chen, Hsueh-Chih – Creativity Research Journal, 2012

Product assessment is widely applied in creative studies, typically as an important dependent measure. Within this context, this study had 2 purposes. First, the focus of this research was on methods for investigating possible rater effects, an issue that has not received a great deal of attention in past creativity studies. Second, the…

Descriptors: Item Response Theory, Creativity, Interrater Reliability, Undergraduate Students

Setting Standards and Detecting Intrajudge Inconsistency Using Interdependent Evaluation of Response Alternatives

Peer reviewed

Direct link

Chang, Lei; Van Der Linden, Wim J.; Vos, Hans J. – Educational and Psychological Measurement, 2004

This article introduces a new test-centered standard-setting method as well as a procedure to detect intrajudge inconsistency of the method. The standard-setting method that is based on interdependent evaluations of alternative responses has judges closely evaluate the process that examinees use to solve multiple-choice items. The new method is…

Descriptors: Standard Setting (Scoring), Interrater Reliability, Foreign Countries, Evaluation Methods

Assessing Oral Communication Skills--Reflections of an Examiner.

Peer reviewed

Taylor, Roy E.; Davidson, Fred – World Englishes, 1996

This article cautions against complacency in "subjective" assessment, arguing that even tests designed to reflect the development of learner-centered, interactive and communicative approaches to teaching English may have cultural bias built into their assessment criteria. The reply article singles out as an unresolved issue whether or…

Descriptors: Cultural Context, English, Error of Measurement, Ethnic Groups

Previous Page | Next Page »

Pages: 1 | 2

Arends-Tòth, Judit	1
Bais, Frank	1
Browne, Dillon T.	1
Chamberlain, Suzanne	1
Chang, Lei	1
Chen, Hsueh-Chih	1
Chen, Po-Hsi	1
Davidson, Fred	1
Deygers, Bart	1
Douhou, Salima	1
Erdogan, Semra	1
Erman Aslanoglu, Aslihan	1
Gierl, Mark J.	1
Han, Chao	1
Higgins, Steve	1
Hsieh, Mingchuan	1
Hung, Su-Pin	1
Iasonas Lamprianou	1
Jenkins, Jennifer M.	1
Kanada, Yoshikiyo	1
Kaya, Irem Ersöz	1
Kieruj, Natalia	1
Koyama, Soichiro	1
Leckie, George	1
Lugtig, Peter	1
More ▼