NotesFAQContact Us
Collection
Advanced
Search Tips
Audience
Laws, Policies, & Programs
Assessments and Surveys
Trends in International…1
What Works Clearinghouse Rating
Showing 1 to 15 of 17 results Save | Export
Peer reviewed Peer reviewed
Direct linkDirect link
Reeta Neittaanmäki; Iasonas Lamprianou – Language Testing, 2024
This article focuses on rater severity and consistency and their relation to different types of rater experience over a long period of time. The article is based on longitudinal data collected from 2009 to 2019 from the second language Finnish speaking subtest in the National Certificates of Language Proficiency in Finland. The study investigated…
Descriptors: Foreign Countries, Interrater Reliability, Error of Measurement, Experience
Peer reviewed Peer reviewed
Direct linkDirect link
Bais, Frank; Schouten, Barry; Lugtig, Peter; Toepoel, Vera; Arends-Tòth, Judit; Douhou, Salima; Kieruj, Natalia; Morren, Mattijn; Vis, Corrie – Sociological Methods & Research, 2019
Item characteristics can have a significant effect on survey data quality and may be associated with measurement error. Literature on data quality and measurement error is often inconclusive. This could be because item characteristics used for detecting measurement error are not coded unambiguously. In our study, we use a systematic coding…
Descriptors: Foreign Countries, National Surveys, Error of Measurement, Test Items
Peer reviewed Peer reviewed
PDF on ERIC Download full text
Erman Aslanoglu, Aslihan; Sata, Mehmet – Participatory Educational Research, 2021
When students present writing tasks that require higher order thinking skills to work, one of the most important problems is scoring these writing tasks objectively. The fact that raters give scores below or above their performance based on several environmental factors affects the consistency of the measurements. Inconsistencies in scoring…
Descriptors: Interrater Reliability, Evaluators, Error of Measurement, Writing Evaluation
Peer reviewed Peer reviewed
Direct linkDirect link
Takeda, Kazuya; Tanabe, Shigeo; Koyama, Soichiro; Nagai, Tomoko; Sakurai, Hiroaki; Kanada, Yoshikiyo; Shomoto, Koji – Measurement in Physical Education and Exercise Science, 2018
The aim of this study was to clarify the intra- and inter-rater reliability of the rate of force development in hip abductor muscle force measurements using a hand-held dynamometer. Thirty healthy adults were separately assessed by two independent raters on two separate days. Rate of force development was calculated from the slope of the…
Descriptors: Interrater Reliability, Human Body, Measurement Equipment, Handheld Devices
Peer reviewed Peer reviewed
Direct linkDirect link
van Kernebeek, Willem G.; de Schipper, Antoine W.; Savelsbergh, Geert J. P.; Toussaint, Huub M. – Measurement in Physical Education and Exercise Science, 2018
In The Netherlands, the 4-Skills Scan is an instrument for physical education teachers to assess gross motor skills of elementary school children. Little is known about its reliability. Therefore, in this study the test-retest and inter-rater reliability was determined. Respectively, 624 and 557 Dutch 6- to 12-year-old children were analyzed for…
Descriptors: Foreign Countries, Interrater Reliability, Pretests Posttests, Psychomotor Skills
Peer reviewed Peer reviewed
Direct linkDirect link
Tymms, Peter; Higgins, Steve – Studies in Higher Education, 2018
The United Kingdom's (UK's) Research Excellence Framework of 2014 was an expensive high stakes evaluation which had a range of impacts on higher education institutions across the country. One component was an assessment of the quality of research outputs where a major feature was a series of panels organised to read and rate the outputs of their…
Descriptors: Research Reports, Educational Research, Journal Articles, Teacher Researchers
Peer reviewed Peer reviewed
PDF on ERIC Download full text
Temel, Gülhan Orekici; Erdogan, Semra; Selvi, Hüseyin; Kaya, Irem Ersöz – Educational Sciences: Theory and Practice, 2016
Studies based on longitudinal data focus on the change and development of the situation being investigated and allow for examining cases regarding education, individual development, cultural change, and socioeconomic improvement in time. However, as these studies require taking repeated measures in different time periods, they may include various…
Descriptors: Investigations, Sample Size, Longitudinal Studies, Interrater Reliability
Peer reviewed Peer reviewed
Direct linkDirect link
Han, Chao – Language Assessment Quarterly, 2016
As a property of test scores, reliability/dependability constitutes an important psychometric consideration, and it underpins the validity of measurement results. A review of interpreter certification performance tests (ICPTs) reveals that (a) although reliability/dependability checking has been recognized as an important concern, its theoretical…
Descriptors: Foreign Countries, Scores, English, Chinese
Peer reviewed Peer reviewed
Direct linkDirect link
Deygers, Bart; Van Gorp, Koen – Language Testing, 2015
Considering scoring validity as encompassing both reliable rating scale use and valid descriptor interpretation, this study reports on the validation of a CEFR-based scale that was co-constructed and used by novice raters. The research questions this paper wishes to answer are (a) whether it is possible to construct a CEFR-based rating scale with…
Descriptors: Rating Scales, Scoring, Validity, Interrater Reliability
Peer reviewed Peer reviewed
Direct linkDirect link
Browne, Dillon T.; Leckie, George; Prime, Heather; Perlman, Michal; Jenkins, Jennifer M. – Developmental Psychology, 2016
The present study sought to investigate the family, individual, and dyad-specific contributions to observed cognitive sensitivity during family interactions. Moreover, the influence of cumulative risk on sensitivity at the aforementioned levels of the family was examined. Mothers and 2 children per family were observed interacting in a round robin…
Descriptors: Family Relationship, Family (Sociological Unit), Sibling Relationship, Siblings
Peer reviewed Peer reviewed
Direct linkDirect link
Hsieh, Mingchuan – Language Assessment Quarterly, 2013
The Yes/No Angoff and Bookmark method for setting standards on educational assessment are currently two of the most popular standard-setting methods. However, there is no research into the comparability of these two methods in the context of language assessment. This study compared results from the Yes/No Angoff and Bookmark methods as applied to…
Descriptors: Standard Setting (Scoring), Comparative Analysis, Language Tests, Multiple Choice Tests
Peer reviewed Peer reviewed
Direct linkDirect link
Chamberlain, Suzanne – Research Papers in Education, 2013
This paper presents the findings of a study designed to explore qualification users' perceptions and experiences of reliability in the context of national assessment outcomes in England. The study consisted of 17 focus groups conducted across six sectors of qualification users: students, teachers, trainee teachers, job-seekers, employers and…
Descriptors: Qualifications, Test Reliability, Foreign Countries, Focus Groups
Peer reviewed Peer reviewed
Direct linkDirect link
Hung, Su-Pin; Chen, Po-Hsi; Chen, Hsueh-Chih – Creativity Research Journal, 2012
Product assessment is widely applied in creative studies, typically as an important dependent measure. Within this context, this study had 2 purposes. First, the focus of this research was on methods for investigating possible rater effects, an issue that has not received a great deal of attention in past creativity studies. Second, the…
Descriptors: Item Response Theory, Creativity, Interrater Reliability, Undergraduate Students
Peer reviewed Peer reviewed
Direct linkDirect link
Chang, Lei; Van Der Linden, Wim J.; Vos, Hans J. – Educational and Psychological Measurement, 2004
This article introduces a new test-centered standard-setting method as well as a procedure to detect intrajudge inconsistency of the method. The standard-setting method that is based on interdependent evaluations of alternative responses has judges closely evaluate the process that examinees use to solve multiple-choice items. The new method is…
Descriptors: Standard Setting (Scoring), Interrater Reliability, Foreign Countries, Evaluation Methods
Peer reviewed Peer reviewed
Taylor, Roy E.; Davidson, Fred – World Englishes, 1996
This article cautions against complacency in "subjective" assessment, arguing that even tests designed to reflect the development of learner-centered, interactive and communicative approaches to teaching English may have cultural bias built into their assessment criteria. The reply article singles out as an unresolved issue whether or…
Descriptors: Cultural Context, English, Error of Measurement, Ethnic Groups
Previous Page | Next Page »
Pages: 1  |  2