ERIC - Search Results

Publication Date

In 2025	0
Since 2024	1
Since 2021 (last 5 years)	1
Since 2016 (last 10 years)	5
Since 2006 (last 20 years)	9

Descriptor

Performance Based Assessment	36
Test Items	12
Educational Assessment	10
Generalizability Theory	7
Evaluation Methods	6
Mathematical Models	6
Scores	6
Error of Measurement	5
Physicians	5
Student Evaluation	5
Elementary Education	4
Elementary School Students	4
Item Response Theory	4
Mathematics Tests	4
Models	4
Regression (Statistics)	4
Scoring	4
Simulation	4
Test Validity	4
Testing Programs	4
Comparative Analysis	3
Computer Simulation	3
Equations (Mathematics)	3
Evaluators	3
Higher Education	3
More ▼

Source

Journal of Educational…

Publication Type

Journal Articles	36
Reports - Research	17
Reports - Evaluative	15
Book/Product Reviews	4
Reports - Descriptive	3
Speeches/Meeting Papers	3
Information Analyses	1

Education Level

High Schools	1
Secondary Education	1

Audience

Location

Georgia	1
United States	1

Laws, Policies, & Programs

Assessments and Surveys

United States Medical…	2
Advanced Placement…	1
Comprehensive Tests of Basic…	1
Work Keys (ACT)	1

What Works Clearinghouse Rating

Showing 1 to 15 of 36 results Save | Export

Does Timed Testing Affect the Interpretation of Efficiency Scores?--A GLMM Analysis of Reading Components

Peer reviewed

Direct link

Frank Goldhammer; Ulf Kroehne; Carolin Hahnel; Johannes Naumann; Paul De Boeck – Journal of Educational Measurement, 2024

The efficiency of cognitive component skills is typically assessed with speeded performance tests. Interpreting only effective ability or effective speed as efficiency may be challenging because of the within-person dependency between both variables (speed-ability tradeoff, SAT). The present study measures efficiency as effective ability…

Descriptors: Timed Tests, Efficiency, Scores, Test Interpretation

Investigating Psychometric Isomorphism for Traditional and Performance-Based Assessment

Peer reviewed

Direct link

Fay, Derek M.; Levy, Roy; Mehta, Vandhana – Journal of Educational Measurement, 2018

A common practice in educational assessment is to construct multiple forms of an assessment that consists of tasks with similar psychometric properties. This study utilizes a Bayesian multilevel item response model and descriptive graphical representations to evaluate the psychometric similarity of variations of the same task. These approaches for…

Descriptors: Psychometrics, Performance Based Assessment, Bayesian Statistics, Item Response Theory

Exploring the Influence of Judge Proficiency on Standard-Setting Judgments

Peer reviewed

Direct link

Peabody, Michael R.; Wind, Stefanie A. – Journal of Educational Measurement, 2019

Setting performance standards is a judgmental process involving human opinions and values as well as technical and empirical considerations. Although all cut score decisions are by nature somewhat arbitrary, they should not be capricious. Judges selected for standard-setting panels should have the proper qualifications to make the judgments asked…

Descriptors: Standard Setting, Decision Making, Performance Based Assessment, Evaluators

Predicting Operational Rater-Type Classifications Using Rasch Measurement Theory and Random Forests: A Music Performance Assessment Perspective

Peer reviewed

Direct link

Wesolowski, Brian C. – Journal of Educational Measurement, 2019

The purpose of this study was to build a Random Forest supervised machine learning model in order to predict musical rater-type classifications based upon a Rasch analysis of raters' differential severity/leniency related to item use. Raw scores (N = 1,704) from 142 raters across nine high school solo and ensemble festivals (grades 9-12) were…

Descriptors: Item Response Theory, Prediction, Classification, Artificial Intelligence

Measuring Student Engagement during Collaboration

Peer reviewed

Direct link

Halpin, Peter F.; von Davier, Alina A.; Hao, Jiangang; Liu, Lei – Journal of Educational Measurement, 2017

This article addresses performance assessments that involve collaboration among students. We apply the Hawkes process to infer whether the actions of one student are associated with increased probability of further actions by his/her partner(s) in the near future. This leads to an intuitive notion of engagement among collaborators, and we consider…

Descriptors: Performance Based Assessment, Student Evaluation, Cooperative Learning, Inferences

Psychometric Equivalence of Ratings for Repeat Examinees on a Performance Assessment for Physician Licensure

Peer reviewed

Direct link

Raymond, Mark R.; Swygert, Kimberly A.; Kahraman, Nilufer – Journal of Educational Measurement, 2012

Although a few studies report sizable score gains for examinees who repeat performance-based assessments, research has not yet addressed the reliability and validity of inferences based on ratings of repeat examinees on such tests. This study analyzed scores for 8,457 single-take examinees and 4,030 repeat examinees who completed a 6-hour clinical…

Descriptors: Physicians, Licensing Examinations (Professions), Performance Based Assessment, Repetition

An Examination of Rater Drift within a Generalizability Theory Framework

Peer reviewed

Direct link

Harik, Polina; Clauser, Brian E.; Grabovsky, Irina; Nungester, Ronald J.; Swanson, Dave; Nandakumar, Ratna – Journal of Educational Measurement, 2009

The present study examined the long-term usefulness of estimated parameters used to adjust the scores from a performance assessment to account for differences in rater stringency. Ratings from four components of the USMLE[R] Step 2 Clinical Skills Examination data were analyzed. A generalizability-theory framework was used to examine the extent to…

Descriptors: Generalizability Theory, Performance Based Assessment, Performance Tests, Clinical Experience

DIF Detection and Effect Size Measures for Polytomously Scored Items

Peer reviewed

Direct link

Kim, Seock-Ho; Cohen, Allan S.; Alagoz, Cigdem; Kim, Sukwoo – Journal of Educational Measurement, 2007

Data from a large-scale performance assessment (N = 105,731) were analyzed with five differential item functioning (DIF) detection methods for polytomous items to examine the congruence among the DIF detection methods. Two different versions of the item response theory (IRT) model-based likelihood ratio test, the logistic regression likelihood…

Descriptors: Performance Based Assessment, Performance Tests, Item Response Theory, Test Bias

Note on Sources of Sampling Variability in Science Performance Assessments.

Peer reviewed

Shavelson, Richard J.; Ruiz-Primo, Maria Araceli; Wiley, Edward W. – Journal of Educational Measurement, 1999

Reports a reanalysis of data collected in a person x task x occasion rater or method G-study design (M. Ruiz-Primo and others, 1993), and brings this reanalysis to bear on the interpretation of task-sampling variability and the convergence of different performance-assessment methods. (SLD)

Descriptors: Performance Based Assessment, Sampling, Sciences

Evaluating Rater Accuracy in Performance Assessments.

Peer reviewed

Englehard, George, Jr. – Journal of Educational Measurement, 1996

A new method for evaluating rater accuracy within the context of performance assessments is described. It uses an extended Rasch measurement model, FACETS, which is illustrated with 373 benchmark papers from the Georgia High School Graduation Writing Test rated by 20 operational raters and an expert panel. (SLD)

Descriptors: Essay Tests, Evaluation Methods, Evaluators, Performance Based Assessment

Current Concerns in Validity Theory.

Peer reviewed

Kane, Michael T. – Journal of Educational Measurement, 2001

Provides a brief historical review of construct validity and discusses the current state of validity theory, emphasizing the role of arguments in validation. Examines the application of an argument-based approach with regard to the distinction between performance-based and theory-based interpretations and the role of consequences in validation.…

Descriptors: Construct Validity, Educational Testing, Performance Based Assessment, Theories

Accuracy of Random Groups Equating with Very Small Samples

Peer reviewed

Direct link

Skaggs, Gary – Journal of Educational Measurement, 2005

This study investigated the effectiveness of equating with very small samples using the random groups design. Of particular interest was equating accuracy at specific scores where performance standards might be set. Two sets of simulations were carried out, one in which the two forms were identical and one in which they differed by a tenth of a…

Descriptors: Equated Scores, Simulation, Performance Based Assessment, Evaluation Methods

Scaling Performance Assessments: A Comparison of One-Parameter and Two-Parameter Partial Credit Models.

Peer reviewed

Fitzpatrick, Anne R.; And Others – Journal of Educational Measurement, 1996

One-parameter (1PPC) and two-parameter partial credit (2PPC) models were compared using real and simulated data with constructed response items present. Results suggest that the more flexible three-parameter logistic-2PPC model combination produces better model fit than the combination of the one-parameter logistic and the 1PPC models. (SLD)

Descriptors: Comparative Analysis, Constructed Response, Goodness of Fit, Performance Based Assessment

Components of Rater Error in a Complex Performance Assessment.

Peer reviewed

Clauser, Brian E.; Clyman, Stephen G.; Swanson, David B. – Journal of Educational Measurement, 1999

Two studies focused on aspects of the rating process in performance assessment. The first, which involved 15 raters and about 400 medical students, made the "committee" facet of raters working in groups explicit, and the second, which involved about 200 medical students and four raters, made the "rating-occasion" facet…

Descriptors: Error Patterns, Evaluation Methods, Evaluators, Higher Education

The Conventional Wisdom about Group Mean Scores.

Peer reviewed

Brennan, Robert L. – Journal of Educational Measurement, 1995

Generalizability theory is used to show that the assumption that reliability for groups is greater than that for persons (and that error variance for groups is less than that for persons) is not necessarily true. Examples are provided from course evaluation and performance test literature. (SLD)

Descriptors: Course Evaluation, Decision Making, Equations (Mathematics), Generalizability Theory

Previous Page | Next Page »

Pages: 1 | 2 | 3

Clauser, Brian E.	6
Clyman, Stephen G.	3
Harik, Polina	3
Ferrara, Steven	2
Fitzpatrick, Anne R.	2
Huynh, Huynh	2
Margolis, Melissa J.	2
Miller, Timothy R.	2
Raymond, Mark R.	2
Ruiz-Primo, Maria Araceli	2
Shavelson, Richard J.	2
Alagoz, Cigdem	1
Breland, Hunter M.	1
Brennan, Robert L.	1
Carolin Hahnel	1
Cizek, Gregory J.	1
Cohen, Allan S.	1
Englehard, George, Jr.	1
Ercikan, Kadriye	1
Fay, Derek M.	1
Frank Goldhammer	1
Friedman, Stephen J.	1
Grabovsky, Irina	1
Halpin, Peter F.	1
More ▼