Publication Date
In 2025 | 0 |
Since 2024 | 3 |
Since 2021 (last 5 years) | 17 |
Since 2016 (last 10 years) | 30 |
Since 2006 (last 20 years) | 80 |
Descriptor
Scoring | 118 |
Validity | 118 |
Reliability | 90 |
Interrater Reliability | 24 |
Comparative Analysis | 23 |
Evaluation Methods | 20 |
Correlation | 19 |
Student Evaluation | 19 |
Foreign Countries | 18 |
Scores | 18 |
Test Construction | 18 |
More ▼ |
Source
Author
Attali, Yigal | 3 |
Burstein, Jill | 2 |
Childs, Ruth A. | 2 |
Crawford, Angela R. | 2 |
Darling-Hammond, Linda | 2 |
Jaciw, Andrew P. | 2 |
Johnson, Evelyn S. | 2 |
Jones, Ian | 2 |
Moylan, Laura A. | 2 |
O'Neil, Harold F., Jr. | 2 |
Pollitt, Alastair | 2 |
More ▼ |
Publication Type
Education Level
Higher Education | 18 |
Postsecondary Education | 16 |
Elementary Education | 14 |
Secondary Education | 13 |
Elementary Secondary Education | 7 |
High Schools | 7 |
Grade 8 | 6 |
Middle Schools | 6 |
Junior High Schools | 5 |
Grade 6 | 4 |
Grade 10 | 3 |
More ▼ |
Location
Australia | 3 |
California | 3 |
New York | 3 |
United Kingdom (England) | 3 |
Canada | 2 |
Connecticut | 2 |
Jordan | 2 |
New Hampshire | 2 |
Pennsylvania | 2 |
Rhode Island | 2 |
Turkey | 2 |
More ▼ |
Laws, Policies, & Programs
Every Student Succeeds Act… | 3 |
Elementary and Secondary… | 1 |
Kentucky Education Reform Act… | 1 |
No Child Left Behind Act 2001 | 1 |
Assessments and Surveys
What Works Clearinghouse Rating
Conti, Gary J. – Journal of Education and Learning, 2023
The use of personality inventories has been limited because of their cost and the length. To overcome these limitations, this study created the Personality Identity Estimator (PIE), an easy-to-use inventory to estimate personality types that can be used at no cost. PIE is a categorical inventory containing 12 items with 3 items for each of the 4…
Descriptors: Personality Measures, Personality Traits, Validity, Reliability
Doewes, Afrizal; Pechenizkiy, Mykola – International Educational Data Mining Society, 2021
Scoring essays is generally an exhausting and time-consuming task for teachers. Automated Essay Scoring (AES) facilitates the scoring process to be faster and more consistent. The most logical way to assess the performance of an automated scorer is by measuring the score agreement with the human raters. However, we provide empirical evidence that…
Descriptors: Man Machine Systems, Automation, Computer Assisted Testing, Scoring
Alyson Burnett; Katlyn Lee Milless; Michelle Bennett; Whitney Kozakowski; Sonia Alves; Christine Ross – Regional Educational Laboratory Mid-Atlantic, 2024
This study analyzed Pennsylvania School Climate Survey data from students and staff in the 2021/22 school year to assess the validity and reliability of the elementary school student version of the survey; approaches to scoring the survey in individual schools at all grade levels; and perceptions of school climate across student, staff, and school…
Descriptors: Educational Environment, Decision Making, Surveys, Validity
Davies, Ben; Alcock, Lara; Jones, Ian – Educational Studies in Mathematics, 2020
Proof is central to mathematics and has drawn substantial attention from the mathematics education community. Yet, valid and reliable measures of proof comprehension remain rare. In this article, we present a study investigating proof comprehension via students' summaries of a given proof. These summaries were evaluated by expert judges making…
Descriptors: Mathematical Logic, Mathematics Skills, Comprehension, Reliability
Paul Deane; Duanli Yan; Katherine Castellano; Yigal Attali; Michelle Lamar; Mo Zhang; Ian Blood; James V. Bruno; Chen Li; Wenju Cui; Chunyi Ruan; Colleen Appel; Kofi James; Rodolfo Long; Farah Qureshi – ETS Research Report Series, 2024
This paper presents a multidimensional model of variation in writing quality, register, and genre in student essays, trained and tested via confirmatory factor analysis of 1.37 million essay submissions to ETS' digital writing service, Criterion®. The model was also validated with several other corpora, which indicated that it provides a…
Descriptors: Writing (Composition), Essays, Models, Elementary School Students
Curran, Patrick J.; Georgeson, A. R.; Bauer, Daniel J.; Hussong, Andrea M. – International Journal of Behavioral Development, 2021
Conducting valid and reliable empirical research in the prevention sciences is an inherently difficult and challenging task. Chief among these is the need to obtain numerical scores of underlying theoretical constructs for use in subsequent analysis. This challenge is further exacerbated by the increasingly common need to consider multiple…
Descriptors: Psychometrics, Scoring, Prevention, Scores
Evaluating an Explicit Instruction Teacher Observation Protocol through a Validity Argument Approach
Johnson, Evelyn S.; Zheng, Yuzhu; Crawford, Angela R.; Moylan, Laura A. – Journal of Experimental Education, 2022
In this study, we examined the scoring and generalizability assumptions of an explicit instruction (EI) special education teacher observation protocol using many-faceted Rasch measurement (MFRM). Video observations of classroom instruction from 48 special education teachers across four states were collected. External raters (n = 20) were trained…
Descriptors: Direct Instruction, Teacher Education, Classroom Observation Techniques, Validity
Romig, John Elwood; Olsen, Amanda A. – Reading & Writing Quarterly, 2021
Compared to other content areas, there is a dearth of research examining curriculum-based measurement of writing (CBM-W). This study conducted a conceptual replication examining the reliability, stability, and sensitivity to growth of slopes produced from CBM-W. Eighty-nine (N = 89) eighth-grade students responded to one CBM-W probe weekly for 11…
Descriptors: Curriculum Based Assessment, Writing Evaluation, Middle School Students, Grade 8
Lane, Suzanne – Journal of Educational Measurement, 2019
Rater-mediated assessments require the evaluation of the accuracy and consistency of the inferences made by the raters to ensure the validity of score interpretations and uses. Modeling rater response processes allows for a better understanding of how raters map their representations of the examinee performance to their representation of the…
Descriptors: Responses, Accuracy, Validity, Interrater Reliability
Regional Educational Laboratory Mid-Atlantic, 2024
These are the appendixes for the report, "Strengthening the Pennsylvania School Climate Survey to Inform School Decisionmaking." This study analyzed Pennsylvania School Climate Survey data from students and staff in the 2021/22 school year to assess the validity and reliability of the elementary school student version of the survey;…
Descriptors: Educational Environment, Surveys, Decision Making, School Personnel
Kocakulah, Aysel – Participatory Educational Research, 2022
The aim of this study is to develop and apply a rubric to evaluate the solutions proposed for questions about electromagnetic induction belonging to university second year pre-service teachers. In this study which has pretest-posttest quasi-experimental design with control group, teaching of the topic of electromagnetic induction was applied to…
Descriptors: Scoring Rubrics, Student Evaluation, Undergraduate Students, Problem Solving
Evaluating an Explicit Instruction Teacher Observation Protocol through a Validity Argument Approach
Johnson, Evelyn S.; Zheng, Yuzhu; Crawford, Angela R.; Moylan, Laura A. – Grantee Submission, 2020
In this study, we examined the scoring and generalizability assumptions of an Explicit Instruction (EI) special education teacher observation protocol using many-faceted Rasch measurement (MFRM). Video observations of classroom instruction from 48 special education teachers across four states were collected. External raters (n = 20) were trained…
Descriptors: Direct Instruction, Teacher Evaluation, Classroom Observation Techniques, Validity
Nelson, Nickola Wolf; Plante, Elena – Language, Speech, and Hearing Services in Schools, 2022
Purpose: This study evaluated the equivalence of the Test of Integrated Language and Literacy Skills (TILLS) when administrated via telepractice (Tele-TILLS) and face-to-face methods. Method: Participants were 51 children and adolescents in three age bands, ages 6-7 years (n = 9), 8-11 years (n = 21), and 12-18 years (n = 21). Data were gathered…
Descriptors: Telecommunications, Standardized Tests, Language Skills, Literacy
Olson, Daniel J. – Language Testing, 2023
Measuring language dominance, broadly defined as the relative strength of each of a bilingual's two languages, remains a crucial methodological issue in bilingualism research. While various methods have been proposed, the Bilingual Language Profile (BLP) has been one of the most widely used tools for measuring language dominance. While previous…
Descriptors: Bilingualism, Language Dominance, Native Language, Second Language Learning
Beaty, Roger E.; Johnson, Dan R.; Zeitlen, Daniel C.; Forthmann, Boris – Creativity Research Journal, 2022
Semantic distance is increasingly used for automated scoring of originality on divergent thinking tasks, such as the Alternate Uses Task (AUT). Despite some psychometric support for semantic distance -- including positive correlations with human creativity ratings -- additional work is needed to optimize its reliability and validity, including…
Descriptors: Semantics, Scoring, Creative Thinking, Creativity