Publication Date
In 2025 | 0 |
Since 2024 | 0 |
Since 2021 (last 5 years) | 2 |
Since 2016 (last 10 years) | 5 |
Since 2006 (last 20 years) | 8 |
Descriptor
Performance Based Assessment | 18 |
Psychometrics | 5 |
Scores | 5 |
Scoring | 5 |
Evaluation Methods | 4 |
Interrater Reliability | 4 |
Rating Scales | 4 |
Reliability | 4 |
Academic Achievement | 3 |
Evaluators | 3 |
Generalizability Theory | 3 |
More ▼ |
Source
Educational and Psychological… | 18 |
Author
Engelhard, George, Jr. | 2 |
Abedi, Jamal | 1 |
Alliger, George M. | 1 |
Baker, Eva L. | 1 |
Burry-Stock, Judith A. | 1 |
Chen, Michelle Y. | 1 |
Chiu, Chi-Kwan | 1 |
Crehan, Kevin D. | 1 |
Cronbach, Lee J. | 1 |
Eckes, Thomas | 1 |
Edelman, Amanda | 1 |
More ▼ |
Publication Type
Journal Articles | 18 |
Reports - Research | 13 |
Reports - Evaluative | 6 |
Education Level
Elementary Education | 1 |
Grade 3 | 1 |
Higher Education | 1 |
Junior High Schools | 1 |
Middle Schools | 1 |
Postsecondary Education | 1 |
Secondary Education | 1 |
Audience
Laws, Policies, & Programs
Assessments and Surveys
What Works Clearinghouse Rating
Wind, Stefanie A.; Guo, Wenjing – Educational and Psychological Measurement, 2019
Rater effects, or raters' tendencies to assign ratings to performances that are different from the ratings that the performances warranted, are well documented in rater-mediated assessments across a variety of disciplines. In many real-data studies of rater effects, researchers have reported that raters exhibit more than one effect, such as a…
Descriptors: Evaluators, Bias, Scoring, Data Collection
Jin, Kuan-Yu; Eckes, Thomas – Educational and Psychological Measurement, 2022
Performance assessments heavily rely on human ratings. These ratings are typically subject to various forms of error and bias, threatening the assessment outcomes' validity and fairness. Differential rater functioning (DRF) is a special kind of threat to fairness manifesting itself in unwanted interactions between raters and performance- or…
Descriptors: Performance Based Assessment, Rating Scales, Test Bias, Student Evaluation
Chen, Michelle Y.; Liu, Yan; Zumbo, Bruno D. – Educational and Psychological Measurement, 2020
This study introduces a novel differential item functioning (DIF) method based on propensity score matching that tackles two challenges in analyzing performance assessment data, that is, continuous task scores and lack of a reliable internal variable as a proxy for ability or aptitude. The proposed DIF method consists of two main stages. First,…
Descriptors: Probability, Scores, Evaluation Methods, Test Items
Wang, Jue; Engelhard, George, Jr.; Wolfe, Edward W. – Educational and Psychological Measurement, 2016
The number of performance assessments continues to increase around the world, and it is important to explore new methods for evaluating the quality of ratings obtained from raters. This study describes an unfolding model for examining rater accuracy. Accuracy is defined as the difference between observed and expert ratings. Dichotomous accuracy…
Descriptors: Evaluators, Accuracy, Performance Based Assessment, Models
Martínez, José Felipe; Kloser, Matt; Srinivasan, Jayashri; Stecher, Brian; Edelman, Amanda – Educational and Psychological Measurement, 2022
Adoption of new instructional standards in science demands high-quality information about classroom practice. Teacher portfolios can be used to assess instructional practice and support teacher self-reflection anchored in authentic evidence from classrooms. This study investigated a new type of electronic portfolio tool that allows efficient…
Descriptors: Science Instruction, Academic Standards, Instructional Innovation, Electronic Publishing
Zhu, Xiaowen; Stone, Clement A. – Educational and Psychological Measurement, 2012
This study examined the relative effectiveness of Bayesian model comparison methods in selecting an appropriate graded response (GR) model for performance assessment applications. Three popular methods were considered: deviance information criterion (DIC), conditional predictive ordinate (CPO), and posterior predictive model checking (PPMC). Using…
Descriptors: Bayesian Statistics, Item Response Theory, Comparative Analysis, Models
Engelhard, George, Jr. – Educational and Psychological Measurement, 2011
The purpose of this study is to describe a new approach for evaluating the judgments of standard-setting panelists within the context of the bookmark procedure. The bookmark procedure is widely used for setting performance standards on high-stakes assessments. A many-faceted Rasch (MFR) model is proposed for evaluating the bookmark judgments of…
Descriptors: Educational Assessment, Performance Based Assessment, Grade 3, Evaluation Methods
Huang, Chiungjung – Educational and Psychological Measurement, 2009
This study examined the percentage of task-sampling variability in performance assessment via a meta-analysis. In total, 50 studies containing 130 independent data sets were analyzed. Overall results indicate that the percentage of variance for (a) differential difficulty of task was roughly 12% and (b) examinee's differential performance of the…
Descriptors: Test Bias, Research Design, Performance Based Assessment, Performance Tests

Lunz, Mary E.; And Others – Educational and Psychological Measurement, 1994
In a study involving eight judges, analysis with the FACETS model provides evidence that judges grade differently, whether or not scores correlate well. This outcome suggests that adjustments for differences among judges should be made before student measures are estimated to produce reproducible decisions. (SLD)
Descriptors: Correlation, Decision Making, Evaluation Methods, Evaluators

Burry-Stock, Judith A.; And Others – Educational and Psychological Measurement, 1996
It is argued that interrater agreement is a psychometric property which is theoretically different from classic reliability. Formulas are presented to illustrate a set of algebraically equivalent rater agreement indices that are intended to provide educational and psychological researchers with a practical way to establish a measure of rater…
Descriptors: Algebra, Educational Research, Interrater Reliability, Measures (Individuals)

Plake, Barbara S.; And Others – Educational and Psychological Measurement, 1997
The dominant profile judgment method, designed for use with profiles of polytomous scores on exercises in a performance-based assessment, is presented as a standard-setting method. The approach guides standard-setting panelists in articulating their standard-setting policies and allows for complex policy statements. (SLD)
Descriptors: Educational Policy, Field Tests, Performance Based Assessment, Policy Formation

Cronbach, Lee J.; And Others – Educational and Psychological Measurement, 1997
Through the standard error, rather than a reliability coefficient, generalizability theory provides an indicator of the uncertainty attached to school and individual scores on performance assessments. Recommendations are made to apply generalizability theory to current performance assessments, emphasizing practices that differ from usual…
Descriptors: Academic Achievement, Error of Measurement, Generalizability Theory, Performance Based Assessment

Fan, Xitao; Yin, Ping – Educational and Psychological Measurement, 2002
Examined the effects of 2 examinee sample characteristics (group heterogeneity and performance level) on score reliability of optimal performance measurement using extant data sets of more 50,000 high school students and 10,000 sixth graders. Results suggest that both characteristics affect score reliability, and measurement error tends to be…
Descriptors: Elementary School Students, High Achievement, High School Students, High Schools

Mone, Mark A.; And Others – Educational and Psychological Measurement, 1995
Relationships among self-efficacy, self-esteem, personal goals, and performance over multiple performance trials were examined for 215 college students. Self-efficacy was significantly predictive of personal goals and performance, but self-esteem was not. Results indicate that the more task-specific the measure, the better the prediction. (SLD)
Descriptors: Academic Achievement, College Students, Higher Education, Measurement Techniques

Chiu, Chi-Kwan; Alliger, George M. – Educational and Psychological Measurement, 1990
A method is proposed to combine a relative approach and an absolute approach to performance appraisal by combining graphic rating and ranking. Two studies involving 196 college undergraduates who rated their instructors illustrate the promise offered by the proposed Qualitative Ranking Scale. (SLD)
Descriptors: Evaluation Methods, Graphs, Higher Education, Performance
Previous Page | Next Page »
Pages: 1 | 2