Publication Date
In 2025 | 0 |
Since 2024 | 1 |
Since 2021 (last 5 years) | 4 |
Since 2016 (last 10 years) | 10 |
Since 2006 (last 20 years) | 14 |
Descriptor
Alternative Assessment | 16 |
Error of Measurement | 16 |
Evaluation Methods | 8 |
Scores | 7 |
Test Reliability | 5 |
Educational Assessment | 3 |
Evaluation Criteria | 3 |
Interrater Reliability | 3 |
Measurement Techniques | 3 |
Student Evaluation | 3 |
Test Construction | 3 |
More ▼ |
Source
Author
Publication Type
Journal Articles | 14 |
Reports - Research | 9 |
Reports - Descriptive | 3 |
Reports - Evaluative | 3 |
ERIC Digests in Full Text | 1 |
ERIC Publications | 1 |
Tests/Questionnaires | 1 |
Education Level
Elementary Secondary Education | 3 |
Elementary Education | 2 |
Higher Education | 2 |
Junior High Schools | 2 |
Middle Schools | 2 |
Postsecondary Education | 2 |
Secondary Education | 2 |
Grade 10 | 1 |
Grade 5 | 1 |
Grade 8 | 1 |
High Schools | 1 |
More ▼ |
Audience
Location
Portugal | 1 |
Laws, Policies, & Programs
Assessments and Surveys
SAT (College Admission Test) | 1 |
What Works Clearinghouse Rating
Rebekka Kupffer; Susanne Frick; Eunike Wetzel – Educational and Psychological Measurement, 2024
The multidimensional forced-choice (MFC) format is an alternative to rating scales in which participants rank items according to how well the items describe them. Currently, little is known about how to detect careless responding in MFC data. The aim of this study was to adapt a number of indices used for rating scales to the MFC format and…
Descriptors: Measurement Techniques, Alternative Assessment, Rating Scales, Questionnaires
Little, Todd D.; Bontempo, Daniel; Rioux, Charlie; Tracy, Allison – International Journal of Research & Method in Education, 2022
Multilevel modelling (MLM) is the most frequently used approach for evaluating interventions with clustered data. MLM, however, has some limitations that are associated with numerous obstacles to model estimation and valid inferences. Longitudinal multiple-group (LMG) modelling is a longstanding approach for testing intervention effects using…
Descriptors: Longitudinal Studies, Hierarchical Linear Modeling, Alternative Assessment, Intervention
Martinková, Patrícia; Bartoš, František; Brabec, Marek – Journal of Educational and Behavioral Statistics, 2023
Inter-rater reliability (IRR), which is a prerequisite of high-quality ratings and assessments, may be affected by contextual variables, such as the rater's or ratee's gender, major, or experience. Identification of such heterogeneity sources in IRR is important for the implementation of policies with the potential to decrease measurement error…
Descriptors: Interrater Reliability, Bayesian Statistics, Statistical Inference, Hierarchical Linear Modeling
Robert Meyer; Sara Hu; Michael Christian – Society for Research on Educational Effectiveness, 2023
Background: This paper develops a new method to estimate quasi-experimental evaluation models when it is necessary to control for measurement error in predictors and individual assignment to the treatment group is based on these same fallible variables. A major methodological finding of the study is that standard methods of estimating models that…
Descriptors: Error of Measurement, Measurement Techniques, Elementary Secondary Education, Report Cards
Bardhoshi, Gerta; Erford, Bradley T. – Measurement and Evaluation in Counseling and Development, 2017
Precision is a key facet of test development, with score reliability determined primarily according to the types of error one wants to approximate and demonstrate. This article identifies and discusses several primary forms of reliability estimation: internal consistency (i.e., split-half, KR-20, a), test-retest, alternate forms, interscorer, and…
Descriptors: Scores, Test Reliability, Accuracy, Pretests Posttests
Gottlieb, Derek; Moroye, Christy M. – Journal of Curriculum and Pedagogy, 2016
We examine the reliance on rubrics for educational evaluation and explore whether such tools fulfill their promise. Following Wittgensteinian critical strategies, we explore what "the application of the [rubric] picture looks like" and then evaluate (a) whether those benefits are attributable to rubric use at all, and (b) whether any of…
Descriptors: Scoring Rubrics, Educational Assessment, Student Evaluation, Educational Benefits
Castellano, Katherine E.; McCaffrey, Daniel F. – Educational Measurement: Issues and Practice, 2017
Mean or median student growth percentiles (MGPs) are a popular measure of educator performance, but they lack rigorous evaluation. This study investigates the error in MGP due to test score measurement error (ME). Using analytic derivations, we find that errors in the commonly used MGP are correlated with average prior latent achievement: Teachers…
Descriptors: Teacher Evaluation, Teacher Effectiveness, Value Added Models, Achievement Gains
McNeish, Daniel – Review of Educational Research, 2017
In education research, small samples are common because of financial limitations, logistical challenges, or exploratory studies. With small samples, statistical principles on which researchers rely do not hold, leading to trust issues with model estimates and possible replication issues when scaling up. Researchers are generally aware of such…
Descriptors: Models, Statistical Analysis, Sampling, Sample Size
Furtak, Erin Marie; Ruiz-Primo, Maria Araceli; Bakeman, Roger – Educational Measurement: Issues and Practice, 2017
Formative assessment is a classroom practice that has received much attention in recent years for its established potential at increasing student learning. A frequent analytic approach for determining the quality of formative assessment practices is to develop a coding scheme and determine frequencies with which the codes are observed; however,…
Descriptors: Sequential Approach, Formative Evaluation, Alternative Assessment, Incidence
Davin, Kristin J. – Modern Language Journal, 2016
This article explores the implementation of dynamic assessment (DA) in an elementary school foreign language classroom by considering its theoretical basis and its applicability to second language (L2) teaching, learning, and development. In existing applications of L2 classroom DA, errors serve as a window into learners' instructional needs and…
Descriptors: Alternative Assessment, Elementary School Students, Second Language Learning, Second Language Instruction
Scott-Clayton, Judith; Crosta, Peter M.; Belfield, Clive R. – Educational Evaluation and Policy Analysis, 2014
Remediation is one of the largest single interventions intended to improve outcomes for underprepared college students, yet little is known about the remedial screening process. Using administrative data and a rich predictive model, we find that severe mis-assignments are common using current test-score-cutoff-based policies, with…
Descriptors: Remedial Instruction, Remedial Programs, College Students, Screening Tests
Taylor, Melinda Ann; Pastor, Dena A. – Applied Measurement in Education, 2013
Although federal regulations require testing students with severe cognitive disabilities, there is little guidance regarding how technical quality should be established. It is known that challenges exist with documentation of the reliability of scores for alternate assessments. Typical measures of reliability do little in modeling multiple sources…
Descriptors: Generalizability Theory, Alternative Assessment, Test Reliability, Scores
Elizalde-Utnick, Graciela – Communique, 2008
There is great controversy in the field of learning disabilities (LD) regarding the establishment of criteria for LD identification. The traditional approach to LD identification is to use the IQ-discrepancy. Lyon and colleagues (2001) point out the numerous problems with such an approach, including faulty assumptions about the adequacy of an IQ…
Descriptors: Intervention, Learning Disabilities, Second Language Learning, Intelligence Quotient
Ferrao, Maria – Assessment & Evaluation in Higher Education, 2010
The Bologna Declaration brought reforms into higher education that imply changes in teaching methods, didactic materials and textbooks, infrastructures and laboratories, etc. Statistics and mathematics are disciplines that traditionally have the worst success rates, particularly in non-mathematics core curricula courses. This research project,…
Descriptors: Foreign Countries, Computer Assisted Testing, Educational Technology, Educational Assessment

Hanushek, Eric A.; Taylor, Lori L. – Journal of Human Resources, 1990
Commonly employed measures of school quality can lead to very misleading results. Especially at the state level, nonrepresentative data such as aggregate Scholastic Aptitude Test scores provide very biased measures of school performance. Far superior are direct estimates of achievement growth. (SK)
Descriptors: Academic Achievement, Alternative Assessment, Educational Assessment, Educational Quality
Previous Page | Next Page »
Pages: 1 | 2