Publication Date
In 2025 | 0 |
Since 2024 | 1 |
Since 2021 (last 5 years) | 1 |
Since 2016 (last 10 years) | 5 |
Since 2006 (last 20 years) | 9 |
Descriptor
Source
Journal of Educational… | 36 |
Author
Publication Type
Journal Articles | 36 |
Reports - Research | 17 |
Reports - Evaluative | 15 |
Book/Product Reviews | 4 |
Reports - Descriptive | 3 |
Speeches/Meeting Papers | 3 |
Information Analyses | 1 |
Education Level
High Schools | 1 |
Secondary Education | 1 |
Audience
Location
Georgia | 1 |
United States | 1 |
Laws, Policies, & Programs
Assessments and Surveys
United States Medical… | 2 |
Advanced Placement… | 1 |
Comprehensive Tests of Basic… | 1 |
Work Keys (ACT) | 1 |
What Works Clearinghouse Rating
Frank Goldhammer; Ulf Kroehne; Carolin Hahnel; Johannes Naumann; Paul De Boeck – Journal of Educational Measurement, 2024
The efficiency of cognitive component skills is typically assessed with speeded performance tests. Interpreting only effective ability or effective speed as efficiency may be challenging because of the within-person dependency between both variables (speed-ability tradeoff, SAT). The present study measures efficiency as effective ability…
Descriptors: Timed Tests, Efficiency, Scores, Test Interpretation
Fay, Derek M.; Levy, Roy; Mehta, Vandhana – Journal of Educational Measurement, 2018
A common practice in educational assessment is to construct multiple forms of an assessment that consists of tasks with similar psychometric properties. This study utilizes a Bayesian multilevel item response model and descriptive graphical representations to evaluate the psychometric similarity of variations of the same task. These approaches for…
Descriptors: Psychometrics, Performance Based Assessment, Bayesian Statistics, Item Response Theory
Peabody, Michael R.; Wind, Stefanie A. – Journal of Educational Measurement, 2019
Setting performance standards is a judgmental process involving human opinions and values as well as technical and empirical considerations. Although all cut score decisions are by nature somewhat arbitrary, they should not be capricious. Judges selected for standard-setting panels should have the proper qualifications to make the judgments asked…
Descriptors: Standard Setting, Decision Making, Performance Based Assessment, Evaluators
Wesolowski, Brian C. – Journal of Educational Measurement, 2019
The purpose of this study was to build a Random Forest supervised machine learning model in order to predict musical rater-type classifications based upon a Rasch analysis of raters' differential severity/leniency related to item use. Raw scores (N = 1,704) from 142 raters across nine high school solo and ensemble festivals (grades 9-12) were…
Descriptors: Item Response Theory, Prediction, Classification, Artificial Intelligence
Halpin, Peter F.; von Davier, Alina A.; Hao, Jiangang; Liu, Lei – Journal of Educational Measurement, 2017
This article addresses performance assessments that involve collaboration among students. We apply the Hawkes process to infer whether the actions of one student are associated with increased probability of further actions by his/her partner(s) in the near future. This leads to an intuitive notion of engagement among collaborators, and we consider…
Descriptors: Performance Based Assessment, Student Evaluation, Cooperative Learning, Inferences
Raymond, Mark R.; Swygert, Kimberly A.; Kahraman, Nilufer – Journal of Educational Measurement, 2012
Although a few studies report sizable score gains for examinees who repeat performance-based assessments, research has not yet addressed the reliability and validity of inferences based on ratings of repeat examinees on such tests. This study analyzed scores for 8,457 single-take examinees and 4,030 repeat examinees who completed a 6-hour clinical…
Descriptors: Physicians, Licensing Examinations (Professions), Performance Based Assessment, Repetition
Harik, Polina; Clauser, Brian E.; Grabovsky, Irina; Nungester, Ronald J.; Swanson, Dave; Nandakumar, Ratna – Journal of Educational Measurement, 2009
The present study examined the long-term usefulness of estimated parameters used to adjust the scores from a performance assessment to account for differences in rater stringency. Ratings from four components of the USMLE[R] Step 2 Clinical Skills Examination data were analyzed. A generalizability-theory framework was used to examine the extent to…
Descriptors: Generalizability Theory, Performance Based Assessment, Performance Tests, Clinical Experience
Kim, Seock-Ho; Cohen, Allan S.; Alagoz, Cigdem; Kim, Sukwoo – Journal of Educational Measurement, 2007
Data from a large-scale performance assessment (N = 105,731) were analyzed with five differential item functioning (DIF) detection methods for polytomous items to examine the congruence among the DIF detection methods. Two different versions of the item response theory (IRT) model-based likelihood ratio test, the logistic regression likelihood…
Descriptors: Performance Based Assessment, Performance Tests, Item Response Theory, Test Bias

Shavelson, Richard J.; Ruiz-Primo, Maria Araceli; Wiley, Edward W. – Journal of Educational Measurement, 1999
Reports a reanalysis of data collected in a person x task x occasion rater or method G-study design (M. Ruiz-Primo and others, 1993), and brings this reanalysis to bear on the interpretation of task-sampling variability and the convergence of different performance-assessment methods. (SLD)
Descriptors: Performance Based Assessment, Sampling, Sciences

Englehard, George, Jr. – Journal of Educational Measurement, 1996
A new method for evaluating rater accuracy within the context of performance assessments is described. It uses an extended Rasch measurement model, FACETS, which is illustrated with 373 benchmark papers from the Georgia High School Graduation Writing Test rated by 20 operational raters and an expert panel. (SLD)
Descriptors: Essay Tests, Evaluation Methods, Evaluators, Performance Based Assessment

Kane, Michael T. – Journal of Educational Measurement, 2001
Provides a brief historical review of construct validity and discusses the current state of validity theory, emphasizing the role of arguments in validation. Examines the application of an argument-based approach with regard to the distinction between performance-based and theory-based interpretations and the role of consequences in validation.…
Descriptors: Construct Validity, Educational Testing, Performance Based Assessment, Theories
Skaggs, Gary – Journal of Educational Measurement, 2005
This study investigated the effectiveness of equating with very small samples using the random groups design. Of particular interest was equating accuracy at specific scores where performance standards might be set. Two sets of simulations were carried out, one in which the two forms were identical and one in which they differed by a tenth of a…
Descriptors: Equated Scores, Simulation, Performance Based Assessment, Evaluation Methods

Fitzpatrick, Anne R.; And Others – Journal of Educational Measurement, 1996
One-parameter (1PPC) and two-parameter partial credit (2PPC) models were compared using real and simulated data with constructed response items present. Results suggest that the more flexible three-parameter logistic-2PPC model combination produces better model fit than the combination of the one-parameter logistic and the 1PPC models. (SLD)
Descriptors: Comparative Analysis, Constructed Response, Goodness of Fit, Performance Based Assessment

Clauser, Brian E.; Clyman, Stephen G.; Swanson, David B. – Journal of Educational Measurement, 1999
Two studies focused on aspects of the rating process in performance assessment. The first, which involved 15 raters and about 400 medical students, made the "committee" facet of raters working in groups explicit, and the second, which involved about 200 medical students and four raters, made the "rating-occasion" facet…
Descriptors: Error Patterns, Evaluation Methods, Evaluators, Higher Education

Brennan, Robert L. – Journal of Educational Measurement, 1995
Generalizability theory is used to show that the assumption that reliability for groups is greater than that for persons (and that error variance for groups is less than that for persons) is not necessarily true. Examples are provided from course evaluation and performance test literature. (SLD)
Descriptors: Course Evaluation, Decision Making, Equations (Mathematics), Generalizability Theory