Publication Date
In 2025 | 1 |
Since 2024 | 1 |
Since 2021 (last 5 years) | 7 |
Since 2016 (last 10 years) | 8 |
Since 2006 (last 20 years) | 9 |
Descriptor
Source
Author
Bond, Trevor | 1 |
Brunfaut, Tineke | 1 |
Chan, Kinnie Kin Yee | 1 |
Combs, Trenton | 1 |
Eckes, Thomas | 1 |
Engelhard, George | 1 |
Fennell, Francis | 1 |
Fisher, Douglas | 1 |
Friedberg, Solomon | 1 |
Howe, Roger | 1 |
Jin, Kuan-Yu | 1 |
More ▼ |
Publication Type
Journal Articles | 11 |
Reports - Research | 10 |
Reports - Evaluative | 3 |
Tests/Questionnaires | 2 |
Speeches/Meeting Papers | 1 |
Education Level
Elementary Education | 4 |
Early Childhood Education | 3 |
Kindergarten | 3 |
Primary Education | 3 |
Elementary Secondary Education | 2 |
Higher Education | 2 |
Postsecondary Education | 2 |
Secondary Education | 1 |
Audience
Laws, Policies, & Programs
Assessments and Surveys
edTPA (Teacher Performance… | 1 |
What Works Clearinghouse Rating
Wang, Jue; Engelhard, George; Combs, Trenton – Journal of Experimental Education, 2023
Unfolding models are frequently used to develop scales for measuring attitudes. Recently, unfolding models have been applied to examine rater severity and accuracy within the context of rater-mediated assessments. One of the problems in applying unfolding models to rater-mediated assessments is that the substantive interpretations of the latent…
Descriptors: Writing Evaluation, Scoring, Accuracy, Computational Linguistics
Jin, Kuan-Yu; Eckes, Thomas – Measurement: Interdisciplinary Research and Perspectives, 2022
Recent research on rater effects in performance assessments has increasingly focused on rater centrality, the tendency to assign scores clustering around the rating scale's middle categories. In the present paper, we adopted Jin and Wang's (2018) extended facets modeling approach and constructed a centrality continuum, ranging from raters…
Descriptors: Performance Based Assessment, Evaluators, Scoring, Sample Size
Makiko Kato – Journal of Education and Learning, 2025
This study aims to examine whether differences exist in the factors influencing the difficulty of scoring English summaries and determining scores based on the raters' attributes, and to collect candid opinions, considerations, and tentative suggestions for future improvements to the analytic rubric of summary writing for English learners. In this…
Descriptors: Writing Evaluation, Scoring, Writing Skills, English (Second Language)
Lestari, Santi B.; Brunfaut, Tineke – Language Testing, 2023
Assessing integrated reading-into-writing task performances is known to be challenging, and analytic rating scales have been found to better facilitate the scoring of these performances than other common types of rating scales. However, little is known about how specific operationalizations of the reading-into-writing construct in analytic rating…
Descriptors: Reading Writing Relationship, Writing Tests, Rating Scales, Writing Processes
Chan, Kinnie Kin Yee; Bond, Trevor; Yan, Zi – Language Testing, 2023
We investigated the relationship between the scores assigned by an Automated Essay Scoring (AES) system, the Intelligent Essay Assessor (IEA), and grades allocated by trained, professional human raters to English essay writing by instigating two procedures novel to written-language assessment: the logistic transformation of AES raw scores into…
Descriptors: Computer Assisted Testing, Essays, Scoring, Scores
Moylan, Laura A.; Johnson, Evelyn S.; Zheng, Yuzhu – Reading & Writing Quarterly, 2022
This study describes the development of a special education teacher observation protocol detailing the elements of effective decoding instruction. The psychometric properties of the protocol were investigated through many-facet Rasch measurement (MFRM). Video observations of classroom decoding instruction from 20 special education teachers across…
Descriptors: Decoding (Reading), Special Education Teachers, Psychometrics, Video Technology
Lyness, Scott A.; Peterson, Kent; Yates, Kenneth – Education Sciences, 2021
The Performance Assessment for California Teachers (PACT) is a high stakes summative assessment that was designed to measure pre-service teacher readiness. We examined the inter-rater reliability (IRR) of trained PACT evaluators who rated 19 candidates. As measured by Cohen's weighted kappa, the overall IRR estimate was 0.17 (poor strength of…
Descriptors: High Stakes Tests, Performance Based Assessment, Teacher Effectiveness, Academic Language
Friedberg, Solomon; Shanahan, Tim; Fennell, Francis; Fisher, Douglas; Howe, Roger – Thomas B. Fordham Institute, 2020
A decade ago, states across the nation adopted the Common Core State Standards (CCSS) in an effort to raise the academic bar for their students. This has provoked countless political battles since then--including an especially intense one in Florida. That fight culminated in Governor Ron DeSantis vowing to eliminate and replace the Common Core,…
Descriptors: Common Core State Standards, Academic Achievement, Benchmarking, Academic Standards
Kuiken, Folkert; Vedder, Ineke – Language Testing, 2014
This study investigates the relationship in L2 writing between raters' judgments of communicative adequacy and linguistic complexity by means of six-point Likert scales, and general measures of linguistic performance. The participants were 39 learners of Italian and 32 of Dutch, who wrote two short argumentative essays. The same writing tasks…
Descriptors: Writing Evaluation, Second Language Learning, Evaluators, Native Language

Mills, Craig N.; And Others – Educational Measurement: Issues and Practice, 1991
An approach is presented to the definition of minimal competence for judges to use in standard setting. Panelists in standard setting must receive training to ensure that differences in rating result from differences in perceptions of item difficulty, not in differences of opinion about the definition of minimal competence. (SLD)
Descriptors: Cutting Scores, Decision Making, Definitions, Difficulty Level

Reid, Jerry B. – Educational Measurement: Issues and Practice, 1991
Training judges to generate item ratings in standard setting once the reference group has been defined is discussed. It is proposed that sensitivity to the factors that determine difficulty can be improved through training. Three criteria for determining when training is sufficient are offered. (SLD)
Descriptors: Computer Assisted Instruction, Difficulty Level, Evaluators, Interrater Reliability

Plake, Barbara S.; Melican, Gerald J. – Educational and Psychological Measurement, 1989
The impact of overall test length and difficulty on the expert judgments of item performance by the Nedelsky method were studied. Five university-level instructors predicting the performance of minimally competent candidates on a mathematics examination were fairly consistent in their assessments regardless of length or difficulty of the test.…
Descriptors: Difficulty Level, Estimation (Mathematics), Evaluators, Higher Education
Wheeler, Patricia – 1991
The appropriateness of the Angoff method (W. H. Angoff, 1971) for setting standards on tests was studied. Evaluators (judges) from California school districts and teacher training institutions reviewed 15 NTE (National Teacher Examinations) Program Specialty Area Tests published by the Educational Testing Service for their appropriateness in…
Descriptors: Art Education, Biology, Difficulty Level, Elementary Secondary Education