Publication Date
In 2025 | 0 |
Since 2024 | 0 |
Since 2021 (last 5 years) | 2 |
Since 2016 (last 10 years) | 12 |
Since 2006 (last 20 years) | 26 |
Descriptor
Interrater Reliability | 35 |
Test Items | 35 |
Test Reliability | 35 |
Test Construction | 21 |
Test Validity | 19 |
Foreign Countries | 13 |
Scoring | 12 |
Difficulty Level | 8 |
Psychometrics | 8 |
Testing | 6 |
Content Validity | 4 |
More ▼ |
Source
Author
Publication Type
Education Level
Higher Education | 5 |
Elementary Education | 4 |
Elementary Secondary Education | 4 |
Postsecondary Education | 4 |
Secondary Education | 4 |
Grade 8 | 2 |
Grade 9 | 2 |
High Schools | 2 |
Grade 1 | 1 |
Grade 2 | 1 |
Grade 3 | 1 |
More ▼ |
Audience
Location
Canada | 2 |
Florida | 2 |
New Mexico | 2 |
South Africa | 2 |
United States | 2 |
Australia | 1 |
India | 1 |
Japan | 1 |
Oregon | 1 |
Sweden | 1 |
Tennessee | 1 |
More ▼ |
Laws, Policies, & Programs
Assessments and Surveys
What Works Clearinghouse Rating
Parker, Mark A. J.; Hedgeland, Holly; Jordan, Sally E.; Braithwaite, Nicholas St. J. – European Journal of Science and Mathematics Education, 2023
The study covers the development and testing of the alternative mechanics survey (AMS), a modified force concept inventory (FCI), which used automatically marked free-response questions. Data were collected over a period of three academic years from 611 participants who were taking physics classes at high school and university level. A total of…
Descriptors: Test Construction, Scientific Concepts, Physics, Test Reliability
Atilgan, Hakan; Demir, Elif Kübra; Ogretmen, Tuncay; Basokcu, Tahsin Oguz – International Journal of Progressive Education, 2020
It has become a critical question what the reliability level would be when open-ended questions are used in large-scale selection tests. One of the aims of the present study is to determine what the reliability would be in the event that the answers given by test-takers are scored by experts when open-ended short answer questions are used in…
Descriptors: Foreign Countries, Secondary School Students, Test Items, Test Reliability
Kaharu, Sarintan N.; Mansyur, Jusman – Pegem Journal of Education and Instruction, 2021
This study aims to develop a test that can be used to explore mental models and representation patterns of objects in liquid fluid. The test developed by adapting the Reeves's Development Model was carried out in several stages, namely: determining the orientation and test segments; initial survey; preparation of the initial draft; try out;…
Descriptors: Test Construction, Schemata (Cognition), Scientific Concepts, Water
Koriakin, Taylor A.; McKee, Sarah L.; Schwartz, Marlene B.; Chafouleas, Sandra M. – Journal of School Health, 2020
Background: Stakeholders increasingly recognize the role of policy in implementing Whole School, Whole Community, Whole Child (WSCC) frameworks in schools; however, few tools are currently available to assess alignment between district policies and WSCC concepts. The purpose of this study was to expand the Wellness School Assessment Tool (WellSAT)…
Descriptors: School Policy, Health Services, Health Promotion, Wellness
Martin, David; Jamieson-Proctor, Romina – International Journal of Research & Method in Education, 2020
In Australia, one of the key findings of the Teacher Education Ministerial Advisory Group was that not all graduating pre-service teachers possess adequate pedagogical content knowledge (PCK) to teach effectively. The concern is that higher education providers working with pre-service teachers are using pedagogical practices and assessments which…
Descriptors: Test Construction, Preservice Teachers, Pedagogical Content Knowledge, Foreign Countries
Dempster, Edith R.; Kirby, Nicola F. – Perspectives in Education, 2018
Taxonomies of cognitive demand are frequently used to ensure that assessment tasks include questions ranging from low to high cognitive demand. This paper investigates inter-rater agreement among four evaluators on the cognitive demand of the South African National Senior Certificate Life Sciences examinations after training, practice and…
Descriptors: Interrater Reliability, Biological Sciences, Cognitive Processes, Test Items
Dempster, Edith R.; Kirby, Nicki F. – South African Journal of Education, 2018
Public perception of "declining standards" in school-leaving examinations often accompanies increases in pass rates in schoolleaving examinations. "Declining standards" to the public means easier examination papers. The present study evaluates a South African attempt to estimate the level of difficulty, as distinct from…
Descriptors: Foreign Countries, Interrater Reliability, Difficulty Level, Science Tests
Hampton, Lauren H.; Curtis, Philip R.; Roberts, Megan Y. – Autism: The International Journal of Research and Practice, 2019
Borrowing from a clinical psychology observational methodology, thin-slice observations were used to assess autism characteristics in toddlers. Thin-slices are short observations taken from a longer behavior stream which are assigned ratings by multiple raters using a 5-point scale. The raters' observations are averaged together to assign a…
Descriptors: Autism, Pervasive Developmental Disorders, Observation, Toddlers
Tengberg, Michael – Language Assessment Quarterly, 2018
Reading comprehension is often treated as a multidimensional construct. In many reading tests, items are distributed over reading process categories to represent the subskills expected to constitute comprehension. This study explores (a) the extent to which specified subskills of reading comprehension tests are conceptually conceivable to…
Descriptors: Reading Tests, Reading Comprehension, Scores, Test Results
Rios, Joseph A.; Sparks, Jesse R.; Zhang, Mo; Liu, Ou Lydia – ETS Research Report Series, 2017
Proficiency with written communication (WC) is critical for success in college and careers. As a result, institutions face a growing challenge to accurately evaluate their students' writing skills to obtain data that can support demands of accreditation, accountability, or curricular improvement. Many current standardized measures, however, lack…
Descriptors: Test Construction, Test Validity, Writing Tests, College Outcomes Assessment
Smarter Balanced Assessment Consortium, 2016
The goal of this study was to gather comprehensive evidence about the alignment of the Smarter Balanced summative assessments to the Common Core State Standards (CCSS). Alignment of the Smarter Balanced summative assessments to the CCSS is a critical piece of evidence regarding the validity of inferences students, teachers and policy makers can…
Descriptors: Alignment (Education), Summative Evaluation, Common Core State Standards, Test Content
International Journal of Testing, 2019
These guidelines describe considerations relevant to the assessment of test takers in or across countries or regions that are linguistically or culturally diverse. The guidelines were developed by a committee of experts to help inform test developers, psychometricians, test users, and test administrators about fairness issues in support of the…
Descriptors: Test Bias, Student Diversity, Cultural Differences, Language Usage
Slepkov, Aaron D.; Shiell, Ralph C. – Physical Review Special Topics - Physics Education Research, 2014
Constructed-response (CR) questions are a mainstay of introductory physics textbooks and exams. However, because of the time, cost, and scoring reliability constraints associated with this format, CR questions are being increasingly replaced by multiple-choice (MC) questions in formal exams. The integrated testlet (IT) is a recently developed…
Descriptors: Science Tests, Physics, Responses, Multiple Choice Tests
Taubner, Svenja; Horz, Susanne; Fischer-Kern, Melitta; Doering, Stephan; Buchheim, Anna; Zimmermann, Johannes – Psychological Assessment, 2013
The Reflective Functioning Scale (RFS) was developed to assess individual differences in the ability to mentalize attachment relationships. The RFS assesses mentalization from transcripts of the Adult Attachment Interview (AAI). A global score is given by trained coders on an 11-point scale ranging from antireflective to exceptionally reflective.…
Descriptors: Measures (Individuals), Attachment Behavior, Individual Differences, Adults
Williams, Lunetta M.; Hall, Katrina W.; Hedrick, Wanda B.; Lamkin, Marcia; Abendroth, Jennifer – Journal of Language and Literacy Education, 2013
The purpose of the present study was to develop an instrument to measure reading during in-school independent reading (ISIR). Procedures to establish validity and reliability of the instrument included videotaping and observing students during ISIR, gathering feedback from literacy experts, establishing interrater reliability, crosschecking…
Descriptors: Test Construction, Test Validity, Test Reliability, Video Technology