Publication Date
In 2025 | 24 |
Since 2024 | 96 |
Since 2021 (last 5 years) | 377 |
Since 2016 (last 10 years) | 878 |
Since 2006 (last 20 years) | 1799 |
Descriptor
Source
Author
Publication Type
Education Level
Audience
Researchers | 86 |
Practitioners | 63 |
Administrators | 34 |
Teachers | 24 |
Policymakers | 23 |
Community | 5 |
Media Staff | 5 |
Support Staff | 5 |
Counselors | 2 |
Parents | 2 |
Students | 2 |
More ▼ |
Location
Australia | 64 |
United Kingdom | 57 |
Canada | 53 |
China | 40 |
United States | 39 |
California | 37 |
United Kingdom (England) | 34 |
Texas | 32 |
Turkey | 27 |
Japan | 26 |
Florida | 22 |
More ▼ |
Laws, Policies, & Programs
Assessments and Surveys
What Works Clearinghouse Rating
Wanzer, Dana Linnell – American Journal of Evaluation, 2021
With a lack of consensus of what evaluation is within the field of evaluation, there is a difficulty in communicating to nonevaluators what evaluation is and how evaluation differs from research. To understand how evaluation is defined, both evaluators and researchers were asked how they defined evaluation and, if at all, differentiated evaluation…
Descriptors: Evaluation, Research, Differences, Definitions
Nadas, Rita; Suto, Irenka; Grayson, Rebecca – Educational Research, 2021
Background: Secondary school teachers sometimes teach and assess material outside their specialisms for reasons including staff shortages or the growing popularity of the interdisciplinary courses. We hypothesised that teacher-assessors with different subject specialisms may differ in their interpretations of frequently used words in teaching and…
Descriptors: Secondary School Teachers, Language Usage, Definitions, Evaluators
Wendler, Cathy; Glazer, Nancy; Bridgeman, Brent – Applied Measurement in Education, 2020
Efficient constructed response (CR) scoring requires both accuracy and speed from human raters. This study was designed to determine if setting scoring rate expectations would encourage raters to score at a faster pace, and if so, if there would be differential effects on scoring accuracy for raters who score at different rates. Three rater groups…
Descriptors: Scoring, Expectation, Accuracy, Time
Ping-Lin Chuang – Language Testing, 2025
This experimental study explores how source use features impact raters' judgment of argumentation in a second language (L2) integrated writing test. One hundred four experienced and novice raters were recruited to complete a rating task that simulated the scoring assignment of a local English Placement Test (EPT). Sixty written responses were…
Descriptors: Interrater Reliability, Evaluators, Information Sources, Primary Sources
Jun Liu; Meng Sun; Zile Liu; Yanhua Xu – SAGE Open, 2023
Innovation capability has become a necessary requirement for qualified teachers in the context of informatization. However, the validity and objectivity of existing assessments are unclear. Therefore, this study selected nine researchers to evaluate the instructional innovation capabilities of 60 pre-service teachers from a Chinese normal…
Descriptors: Preservice Teachers, Instructional Innovation, Competence, Item Response Theory
Vitello, Sylvia; Crisp, Victoria; Ireland, Jo – Research Matters, 2023
Assessment materials must be checked for errors before they are presented to candidates. Any errors have the potential to reduce validity. For example, in the most extreme cases, an error may turn an otherwise well-designed exam question into one that is impossible to answer. In Cambridge University Press & Assessment, assessment materials are…
Descriptors: Check Lists, Test Validity, Error Correction, Test Construction
Sunde, Eleah; Briggs, Adam M.; Mitteer, Daniel R. – Journal of Applied Behavior Analysis, 2022
Prior research has evaluated the reliability and validity of structured visual inspection (SVI) criteria for interpreting functional analysis (FA) outcomes (Hagopian et al., 1997; Roane et al., 2013). We adapted these criteria to meet the unique needs of interpreting latency-based FA outcomes and examined the reliability and validity of applying…
Descriptors: Reliability, Validity, Visual Perception, Evaluation Criteria
Ceh, Simon Majed; Edelmann, Carina; Hofer, Gabriela; Benedek, Mathias – Journal of Creative Behavior, 2022
Creativity research crucially relies on creativity evaluations by external raters, but it is not clear what properties characterize good raters. In the present study, we investigated whether rater personality and rater creativity are related to discernment (i.e., the ability to distinguish creative from uncreative responses) when evaluating…
Descriptors: Novices, Evaluators, Creativity, Personality Traits
Jessica Thomas – ProQuest LLC, 2024
The purpose of this quantitative correlational study was to determine if, and to what extent, a relationship existed between teachers' perceptions of their evaluator's instructional leadership and the teachers' evaluation scores at the middle and high school levels in one Florida school district. Ozcan's theory of teacher motivation states that…
Descriptors: Secondary School Teachers, Teacher Attitudes, Teacher Evaluation, Evaluators
Pablo Bezem; Anne Piezunka; Rebecca Jacobsen – Leadership and Policy in Schools, 2024
In an era of test-based accountability, school inspections can offer a more nuanced understanding of why schools fail. Yet, we have limited knowledge of how inspectors arrive at their decisions on school quality. Analyzing inspectors' decision-making can reveal the underlying views regarding school accountability and open opportunities for school…
Descriptors: Inspection, Decision Making, Accountability, Institutional Evaluation
Casabianca, Jodi M.; Donoghue, John R.; Shin, Hyo Jeong; Chao, Szu-Fu; Choi, Ikkyu – Journal of Educational Measurement, 2023
Using item-response theory to model rater effects provides an alternative solution for rater monitoring and diagnosis, compared to using standard performance metrics. In order to fit such models, the ratings data must be sufficiently connected in order to estimate rater effects. Due to popular rating designs used in large-scale testing scenarios,…
Descriptors: Item Response Theory, Alternative Assessment, Evaluators, Research Problems
Ensuring Comparability of Qualifications through Moderation: Implications for Australia's VET Sector
Gillis, Shelley – Journal of Vocational Education and Training, 2023
Maintaining the quality and comparability of qualifications has become a major priority for vocational education and training (VET) systems world-wide, especially for those countries which have decentralised assessment and reporting systems. In the absence of external examinations, one policy solution to align the assessment standards of different…
Descriptors: Foreign Countries, Vocational Education, Qualifications, Quality Control
Mascadri, Julia; Spina, Nerida; Spooner-Lane, Rebecca; Briant, Elizabeth – Assessment & Evaluation in Higher Education, 2023
Australia has recently implemented Teaching Performance Assessments (TPAs) as a national accreditation requirement to assess final year preservice teachers' classroom readiness. In 2019, an Australian university developed a TPA to meet this requirement, comprising three written components and one oral component. This exploratory study investigated…
Descriptors: Foreign Countries, Evaluators, Oral Language, Performance Based Assessment
Ishikawa, Shin'ichiro – LEARN Journal: Language Education and Acquisition Research Network, 2023
TESOL practitioners, especially in Asia, tend to believe that reliable assessment of students' L2 English speech can be done solely by L1 English native speakers with sufficient teaching and assessment experiences. Such a belief, however, may need to be reconsidered from a new perspective of "diversity and inclusivity." This study used…
Descriptors: English (Second Language), Evaluators, Teaching Experience, Speech
Huiying Cai; Xun Yan – Language Testing, 2024
Rater comments tend to be qualitatively analyzed to indicate raters' application of rating scales. This study applied natural language processing (NLP) techniques to quantify meaningful, behavioral information from a corpus of rater comments and triangulated that information with a many-facet Rasch measurement (MFRM) analysis of rater scores. The…
Descriptors: Natural Language Processing, Item Response Theory, Rating Scales, Writing Evaluation