Publication Date
In 2025 | 0 |
Since 2024 | 1 |
Since 2021 (last 5 years) | 1 |
Since 2016 (last 10 years) | 15 |
Since 2006 (last 20 years) | 32 |
Descriptor
Quality Control | 44 |
Scoring | 44 |
Test Construction | 14 |
Test Reliability | 14 |
Evaluation Methods | 11 |
Test Validity | 11 |
Scores | 10 |
Automation | 9 |
Data Collection | 9 |
Foreign Countries | 9 |
Testing | 9 |
More ▼ |
Source
Author
Bejar, Isaac I. | 4 |
Allalouf, Avi | 3 |
Martin, Michael O., Ed. | 3 |
Mullis, Ina V. S., Ed. | 2 |
Williamson, David M. | 2 |
Ahmed, Ayesha | 1 |
Amsbary, Michelle | 1 |
Annis, Terri | 1 |
Baer, Justin | 1 |
Baldi, Stephane, Ed. | 1 |
Baumer, Michal | 1 |
More ▼ |
Publication Type
Education Level
Elementary Education | 7 |
Secondary Education | 7 |
Early Childhood Education | 6 |
Elementary Secondary Education | 5 |
Grade 4 | 5 |
High Schools | 5 |
Intermediate Grades | 5 |
Junior High Schools | 5 |
Middle Schools | 5 |
Grade 3 | 4 |
Grade 5 | 4 |
More ▼ |
Location
Rhode Island | 4 |
United Kingdom | 2 |
United States | 2 |
Australia | 1 |
Austria | 1 |
Belgium | 1 |
Canada | 1 |
Chile | 1 |
Cyprus | 1 |
Czech Republic | 1 |
Denmark | 1 |
More ▼ |
Laws, Policies, & Programs
Assessments and Surveys
What Works Clearinghouse Rating
Mark White; Matt Ronfeldt – Educational Assessment, 2024
Standardized observation systems seek to reliably measure a specific conceptualization of teaching quality, managing rater error through mechanisms such as certification, calibration, validation, and double-scoring. These mechanisms both support high quality scoring and generate the empirical evidence used to support the scoring inference (i.e.,…
Descriptors: Interrater Reliability, Quality Control, Teacher Effectiveness, Error Patterns
Bejar, Isaac I.; Li, Chen; McCaffrey, Daniel – Applied Measurement in Education, 2020
We evaluate the feasibility of developing predictive models of rater behavior, that is, "rater-specific" models for predicting the scores produced by a rater under operational conditions. In the present study, the dependent variable is the score assigned to essays by a rater, and the predictors are linguistic attributes of the essays…
Descriptors: Scoring, Essays, Behavior, Predictive Measurement
Wendler, Cathy; Glazer, Nancy; Cline, Frederick – ETS Research Report Series, 2019
One of the challenges in scoring constructed-response (CR) items and tasks is ensuring that rater drift does not occur during or across scoring windows. Rater drift reflects changes in how raters interpret and use established scoring criteria to assign essay scores. Calibration is a process used to help control rater drift and, as such, serves as…
Descriptors: College Entrance Examinations, Graduate Study, Accuracy, Test Reliability
Allalouf, Avi; Gutentag, Tony; Baumer, Michal – Educational Measurement: Issues and Practice, 2017
Quality control (QC) in testing is paramount. QC procedures for tests can be divided into two types. The first type, one that has been well researched, is QC for tests administered to large population groups on few administration dates using a small set of test forms (e.g., large-scale assessment). The second type is QC for tests, usually…
Descriptors: Quality Control, Scoring, Computer Assisted Testing, Error Patterns
Yang, Min; Yan, Zi; Coniam, David – Educational Research and Evaluation, 2017
This paper reports on a qualitative study on markers' perceptions of onscreen marking (OSM) in association with key influential factors of marking reliability. The study has made adaptations to an existing framework proposed by Black, Suto, and Bramley in 2011 for exploring issues related to influential factors of marking reliability in OSM…
Descriptors: Computer Uses in Education, Reliability, Scoring, Secondary School Teachers
Allalouf, Avi – International Journal of Testing, 2014
The Quality Control (QC) Guidelines are intended to increase the efficiency, precision, and accuracy of the scoring, analysis, and reporting process of testing. The QC Guidelines focus on large-scale testing operations where multiple forms of tests are created for use on set dates. However, they may also be used for a wide variety of other testing…
Descriptors: Quality Control, Scoring, Test Theory, Scores
Rupp, André A. – Applied Measurement in Education, 2018
This article discusses critical methodological design decisions for collecting, interpreting, and synthesizing empirical evidence during the design, deployment, and operational quality-control phases for automated scoring systems. The discussion is inspired by work on operational large-scale systems for automated essay scoring but many of the…
Descriptors: Design, Automation, Scoring, Test Scoring Machines
Lu, Ying; Yen, Wendy M. – ETS Research Report Series, 2014
This article explores the use of longitudinal regression as a tool for identifying scoring inaccuracies. Student progression patterns, as evaluated through longitudinal regressions, typically are more stable from year to year than are scale score distributions and statistics, which require representative samples to conduct credibility checks.…
Descriptors: Quality Control, Regression (Statistics), Scoring, Accuracy
Wagemaker, Hans, Ed. – International Association for the Evaluation of Educational Achievement, 2020
Although International Association for the Evaluation of Educational Achievement-pioneered international large-scale assessment (ILSA) of education is now a well-established science, non-practitioners and many users often substantially misunderstand how large-scale assessments are conducted, what questions and challenges they are designed to…
Descriptors: International Assessment, Achievement Tests, Educational Assessment, Comparative Analysis
International Journal of Testing, 2019
These guidelines describe considerations relevant to the assessment of test takers in or across countries or regions that are linguistically or culturally diverse. The guidelines were developed by a committee of experts to help inform test developers, psychometricians, test users, and test administrators about fairness issues in support of the…
Descriptors: Test Bias, Student Diversity, Cultural Differences, Language Usage
Partnership for Assessment of Readiness for College and Careers, 2019
The Partnership for Assessment of Readiness for College and Careers (PARCC) is a state-led consortium designed to create next-generation assessments that, compared to traditional K-12 assessments, more accurately measure student progress toward college and career readiness. The PARCC assessments are aligned to the Common Core State Standards…
Descriptors: College Readiness, Career Readiness, Common Core State Standards, Language Arts
Greenberg, Julie; Walsh, Kate; McKee, Arthur – National Council on Teacher Quality, 2014
The "NCTQ Teacher Prep Review" evaluates the quality of programs that provide preservice preparation of public school teachers. This appendix describes the scope, methodology, timeline, staff, and standards involved in the production of "Teacher Prep Review 2014." Data collection, validation, and analysis for the report are…
Descriptors: Teacher Education Programs, Preservice Teacher Education, Program Evaluation, Standards
Partnership for Assessment of Readiness for College and Careers, 2018
The purpose of this technical report is to describe the third operational administration of the Partnership for Assessment of Readiness for College and Careers (PARCC) assessments in the 2016-2017 academic year. PARCC is a state-led consortium creating next-generation assessments that, compared to traditional K-12 assessments, more accurately…
Descriptors: College Readiness, Career Readiness, Common Core State Standards, Language Arts
Rhode Island Department of Education, 2015
Rhode Island educators believe that implementing a fair, accurate, and meaningful educator evaluation and support system will help improve teaching and learning. The primary purpose of the Rhode Island Model Teacher Evaluation and Support System (Rhode Island Model) is to help all teachers improve. Through the Model, the goal is to help create a…
Descriptors: Guides, Student Evaluation, Evaluation Methods, Public Schools
New Meridian Corporation, 2020
The purpose of this report is to describe the technical qualities of the 2018-2019 operational administration of the English language arts/literacy (ELA/L) and mathematics summative assessments in grades 3 through 8 and high school. The ELA/L assessments focus on reading and comprehending a range of sufficiently complex texts independently and…
Descriptors: Language Arts, Literacy Education, Mathematics Education, Summative Evaluation