Publication Date
| In 2026 | 0 |
| Since 2025 | 1 |
| Since 2022 (last 5 years) | 1 |
| Since 2017 (last 10 years) | 1 |
| Since 2007 (last 20 years) | 10 |
Descriptor
| Evaluation Methods | 11 |
| Robustness (Statistics) | 11 |
| Test Reliability | 11 |
| Test Validity | 7 |
| Evaluation Research | 5 |
| Item Analysis | 4 |
| Evaluation Problems | 3 |
| Achievement Rating | 2 |
| Educational Change | 2 |
| Educational Indicators | 2 |
| Educational Research | 2 |
| More ▼ | |
Source
Author
| Bang Quan Zheng | 1 |
| Booker, Kevin | 1 |
| Camilli, Gregory | 1 |
| Campbell, Shanyce L. | 1 |
| Dillon, Amanda | 1 |
| English, Taylor | 1 |
| Erceg-Hurn, David M. | 1 |
| Gill, Brian | 1 |
| Henslee, Amber M. | 1 |
| Ho, Andrew D. | 1 |
| Hollin, Clive R. | 1 |
| More ▼ | |
Publication Type
| Journal Articles | 11 |
| Reports - Evaluative | 6 |
| Reports - Research | 4 |
| Reports - Descriptive | 1 |
Education Level
| Higher Education | 5 |
| Elementary Secondary Education | 3 |
| Postsecondary Education | 2 |
Audience
Location
| Finland (Helsinki) | 1 |
| Tennessee | 1 |
| Texas | 1 |
Laws, Policies, & Programs
| No Child Left Behind Act 2001 | 1 |
Assessments and Surveys
What Works Clearinghouse Rating
Bang Quan Zheng; Peter M. Bentler – Structural Equation Modeling: A Multidisciplinary Journal, 2025
This paper aims to advocate for a balanced approach to model fit evaluation in structural equation modeling (SEM). The ongoing debate surrounding chi-square test statistics and fit indices has been characterized by ambiguity and controversy. Despite the acknowledged limitations of relying solely on the chi-square test, its careful application can…
Descriptors: Monte Carlo Methods, Structural Equation Models, Goodness of Fit, Robustness (Statistics)
Ronfeldt, Matthew; Campbell, Shanyce L. – Educational Evaluation and Policy Analysis, 2016
Despite growing calls for more accountability of teacher education programs (TEPs), there is little consensus about how to evaluate them. This study investigates the potential for using observational ratings of program completers to evaluate TEPs. Drawing on statewide data on almost 9,500 program completers, representing 44 providers (183…
Descriptors: Teacher Education Programs, Program Effectiveness, Program Evaluation, Observation
Ho, Andrew D. – Teachers College Record, 2014
Background/Context: The target of assessment validation is not an assessment but the use of an assessment for a purpose. Although the validation literature often provides examples of assessment purposes, comprehensive reviews of these purposes are rare. Additionally, assessment purposes posed for validation are generally described as discrete and…
Descriptors: Elementary Secondary Education, Standardized Tests, Measurement Objectives, Educational Change
Rantanen, Pekka – Assessment & Evaluation in Higher Education, 2013
A multilevel analysis approach was used to analyse students' evaluation of teaching (SET). The low value of inter-rater reliability stresses that any solid conclusions on teaching cannot be made on the basis of single feedbacks. To assess a teacher's general teaching effectiveness, one needs to evaluate four randomly chosen course implementations.…
Descriptors: Test Reliability, Feedback (Response), Generalizability Theory, Student Evaluation of Teacher Performance
Lincove, Jane Arnold; Osborne, Cynthia; Dillon, Amanda; Mills, Nicholas – Journal of Teacher Education, 2014
Despite questions about validity and reliability, the use of value-added estimation methods has moved beyond academic research into state accountability systems for teachers, schools, and teacher preparation programs (TPPs). Prior studies of value-added measurement for TPPs test the validity of researcher-designed models and find that measuring…
Descriptors: Teacher Education Programs, Accountability, Politics of Education, School Statistics
Zimmer, Ron; Gill, Brian; Booker, Kevin; Lavertu, Stephane; Witte, John – Economics of Education Review, 2012
Since their inception, charter schools have been a lighting rod for controversy, with much of the debate revolving around their effectiveness in improving student achievement. Previous research has shown mixed results for student achievement; this could be the consequence of different policy environments or varying methodological approaches with…
Descriptors: Charter Schools, Academic Achievement, School Effectiveness, Educational Improvement
Camilli, Gregory – Educational Research and Evaluation, 2013
In the attempt to identify or prevent unfair tests, both quantitative analyses and logical evaluation are often used. For the most part, fairness evaluation is a pragmatic attempt at determining whether procedural or substantive due process has been accorded to either a group of test takers or an individual. In both the individual and comparative…
Descriptors: Alternative Assessment, Test Bias, Test Content, Test Format
Keeley, Jared W.; English, Taylor; Irons, Jessica; Henslee, Amber M. – Educational and Psychological Measurement, 2013
Many measurement biases affect student evaluations of instruction (SEIs). However, two have been relatively understudied: halo effects and ceiling/floor effects. This study examined these effects in two ways. To examine the halo effect, using a videotaped lecture, we manipulated specific teacher behaviors to be "good" or "bad"…
Descriptors: Robustness (Statistics), Test Bias, Course Evaluation, Student Evaluation of Teacher Performance
Wright, Robert E. – College Student Journal, 2010
The use of standardized tests for outcome assessment has grown dramatically in recent years. Two driving factors have been the No Child Left Behind legislation, and the increase in outcome assessment measures by accrediting agencies such as AACSB, the international accrediting body for business schools. Despite the growth in usage, little effort…
Descriptors: College Outcomes Assessment, Educational Testing, Standardized Tests, Accreditation (Institutions)
Erceg-Hurn, David M.; Mirosevich, Vikki M. – American Psychologist, 2008
Classic parametric statistical significance tests, such as analysis of variance and least squares regression, are widely used by researchers in many disciplines, including psychology. For classic parametric tests to produce accurate results, the assumptions underlying them (e.g., normality and homoscedasticity) must be satisfied. These assumptions…
Descriptors: Statistical Significance, Least Squares Statistics, Effect Size, Statistical Studies
Peer reviewedPalmer, Emma J.; Hollin, Clive R. – Journal of Adolescence, 1996
Offers practitioners and researchers an overview of two inventories used in the study of antisocial and delinquent behavior. Evidence suggests that the inventories are related to behavioral indices associated with antisocial and delinquent behavior. Concludes that both instruments are robust and can yield valuable results. (RJM)
Descriptors: Adolescents, Delinquency, Evaluation Methods, Measures (Individuals)

Direct link
