ERIC - Search Results

Publication Date

In 2025	0
Since 2024	1
Since 2021 (last 5 years)	2
Since 2016 (last 10 years)	8
Since 2006 (last 20 years)	11

Descriptor

Performance Based Assessment	53
Educational Assessment	24
Test Use	16
Test Construction	15
Elementary Secondary Education	14
Scoring	11
Testing Programs	11
Student Evaluation	9
Test Interpretation	8
Validity	8
Accountability	7
Evaluation Methods	7
Measurement Techniques	7
Scores	7
Academic Achievement	6
Achievement Tests	6
Educational Change	6
Test Items	6
Testing Problems	6
Portfolios (Background…	5
Standards	5
State Programs	5
Comparative Analysis	4
Educational Testing	4
Foreign Countries	4
More ▼

Source

Educational Measurement:…

Publication Type

Journal Articles	53
Reports - Evaluative	26
Reports - Research	13
Speeches/Meeting Papers	10
Reports - Descriptive	7
Opinion Papers	4
Guides - Non-Classroom	3
Tests/Questionnaires	2
Book/Product Reviews	1

Education Level

Elementary Secondary Education

Audience

Teachers

Location

United Kingdom	2
California	1
China	1
Indiana	1
Israel	1
Michigan	1
Netherlands	1
New Hampshire	1
Pennsylvania	1
Singapore	1
Sweden	1
United States	1
Vermont	1
More ▼

Laws, Policies, & Programs

No Child Left Behind Act 2001

Assessments and Surveys

Comprehensive Tests of Basic…

What Works Clearinghouse Rating

Showing 1 to 15 of 53 results Save | Export

Expected Classification Accuracy for Categorical Growth Models

Peer reviewed

Direct link

Daniel Murphy; Sarah Quesen; Matthew Brunetti; Quintin Love – Educational Measurement: Issues and Practice, 2024

Categorical growth models describe examinee growth in terms of performance-level category transitions, which implies that some percentage of examinees will be misclassified. This paper introduces a new procedure for estimating the classification accuracy of categorical growth models, based on Rudner's classification accuracy index for item…

Descriptors: Classification, Growth Models, Accuracy, Performance Based Assessment

Exploring the Impact of Rater Effects on Person Fit in Rater-Mediated Assessments

Peer reviewed

Direct link

Wind, Stefanie A. – Educational Measurement: Issues and Practice, 2020

Researchers have documented the impact of rater effects, or raters' tendencies to give different ratings than would be expected given examinee achievement levels, in performance assessments. However, the degree to which rater effects influence person fit, or the reasonableness of test-takers' achievement estimates given their response patterns,…

Descriptors: Performance Based Assessment, Evaluators, Achievement, Influences

Embedded Standard Setting: Aligning Standard-Setting Methodology with Contemporary Assessment Design Principles

Peer reviewed

Direct link

Lewis, Daniel; Cook, Robert – Educational Measurement: Issues and Practice, 2020

In this paper we assert that the practice of principled assessment design renders traditional standard-setting methodology redundant at best and contradictory at worst. We describe the rationale for, and methodological details of, Embedded Standard Setting (ESS; previously, Engineered Cut Scores. Lewis, 2016), an approach to establish performance…

Descriptors: Standard Setting, Evaluation, Cutting Scores, Performance Based Assessment

A Model-Data-Fit-Informed Approach to Score Resolution in Performance Assessments

Peer reviewed

Direct link

Wind, Stefanie A.; Walker, A. Adrienne – Educational Measurement: Issues and Practice, 2021

Many large-scale performance assessments include score resolution procedures for resolving discrepancies in rater judgments. The goal of score resolution is conceptually similar to person fit analyses: To identify students for whom observed scores may not accurately reflect their achievement. Previously, researchers have observed that…

Descriptors: Goodness of Fit, Performance Based Assessment, Evaluators, Decision Making

Impact of Both Local Item Dependencies and Cut-Point Locations on Examinee Classifications

Peer reviewed

Direct link

Rubright, Jonathan D. – Educational Measurement: Issues and Practice, 2018

Performance assessments, scenario-based tasks, and other groups of items carry a risk of violating the local item independence assumption made by unidimensional item response theory (IRT) models. Previous studies have identified negative impacts of ignoring such violations, most notably inflated reliability estimates. Still, the influence of this…

Descriptors: Performance Based Assessment, Item Response Theory, Models, Test Reliability

Detecting Measurement Disturbances in Rater-Mediated Assessments

Peer reviewed

Direct link

Wind, Stefanie A.; Schumacker, Randall E. – Educational Measurement: Issues and Practice, 2017

The term measurement disturbance has been used to describe systematic conditions that affect a measurement process, resulting in a compromised interpretation of person or item estimates. Measurement disturbances have been discussed in relation to systematic response patterns associated with items and persons, such as start-up, plodding, boredom,…

Descriptors: Measurement, Testing Problems, Writing Tests, Performance Based Assessment

Comparability in Balanced Assessment Systems for State Accountability

Peer reviewed

Direct link

Evans, Carla M.; Lyons, Susan – Educational Measurement: Issues and Practice, 2017

The purpose of this study was to test methods that strengthen the comparability claims about annual determinations of student proficiency in English language arts, math, and science (Grades 3-12) in the New Hampshire Performance Assessment of Competency Education (NH PACE) pilot project. First, we examined the literature in order to define…

Descriptors: Academic Achievement, Language Arts, Mathematics Achievement, Science Achievement

Validating English Language Proficiency Assessment Uses for English Learners: Academic Language Proficiency and Content Assessment Performance

Peer reviewed

Direct link

Wolf, Mikyung Kim; Faulkner-Bond, Molly – Educational Measurement: Issues and Practice, 2016

States use standards-based English language proficiency (ELP) assessments to inform relatively high-stakes decisions for English learner (EL) students. Results from these assessments are one of the primary criteria used to determine EL students' level of ELP and readiness for reclassification. The results are also used to evaluate the…

Descriptors: High Stakes Tests, Language Proficiency, Hierarchical Linear Modeling, Scores

An Investigation of Rater Cognition in the Assessment of Projects

Peer reviewed

Direct link

Crisp, Victoria – Educational Measurement: Issues and Practice, 2012

In the United Kingdom, the majority of national assessments involve human raters. The processes by which raters determine the scores to award are central to the assessment process and affect the extent to which valid inferences can be made from assessment outcomes. Thus, understanding rater cognition has become a growing area of research in the…

Descriptors: Foreign Countries, Scores, Protocol Analysis, Social Influences

Judgment-Based Scoring by Teachers as Professional Development: Distinguishing Promises from Proof

Peer reviewed

Direct link

Goldberg, Gail Lynn – Educational Measurement: Issues and Practice, 2012

The engagement of teachers as raters to score constructed response items on assessments of student learning is widely claimed to be a valuable vehicle for professional development. This paper examines the evidence behind those claims from several sources, including research and reports over the past two decades, information from a dozen state…

Descriptors: Academic Achievement, Performance Based Assessment, Scoring, Professional Development

Building and Supporting a Validity Argument for a Standards-Based Classroom Assessment of English Proficiency Based on Teacher Judgments

Peer reviewed

Direct link

Llosa, Lorena – Educational Measurement: Issues and Practice, 2008

Using an argument-based approach to validation, this study examines the quality of teacher judgments in the context of a standards-based classroom assessment of English proficiency. Using Bachman's (2005) assessment use argument (AUA) as a framework for the investigation, this paper first articulates the claims, warrants, rebuttals, and backing…

Descriptors: Protocol Analysis, Multitrait Multimethod Techniques, Validity, Scoring

Validating Measures of Performance.

Peer reviewed

Kane, Michael; Crooks, Terence; Cohen, Allan – Educational Measurement: Issues and Practice, 1999

Analyzes the three major inferences involved in interpretation of performance assessments: (1) scoring of the observed performances; (2) generalization to a domain of assessment performances like those included in the assessment; and (3) extrapolation to the large performance domain of interest. Suggests ways to improve the validity of performance…

Descriptors: Performance Based Assessment, Performance Factors, Scoring, Test Interpretation

Alternative Interpretations of Alternative Assessments: Some Validity Issues in Educational Performance.

Peer reviewed

Bachman, Lyle F. – Educational Measurement: Issues and Practice, 2002

Describes an approach to addressing issues of validity of inferences and the extrapolation of inferences to target domains beyond the assessment for alternative assessments. Makes the case that in both language testing and educational assessment the roles of language and content knowledge must be considered, and that the design and development of…

Descriptors: Alternative Assessment, Educational Assessment, Inferences, Performance Based Assessment

Assessing Differential Item Functioning in Performance Assessment: Review and Recommendations.

Peer reviewed

Penfield, Randall D.; Lam, Tony C. M. – Educational Measurement: Issues and Practice, 2000

Discusses extending research into differential item functioning (DIF) to performance assessment and considers some of the best options available at this time. The most effective strategy for assessing DIF is to use a combination of several methods, preferably one from each of the observed score nonparametric, latent score nonparametric, and…

Descriptors: Evaluation Methods, Item Bias, Nonparametric Statistics, Performance Based Assessment

Where Did All the Data Go? Internet Security for Web-based Assessments.

Peer reviewed

Shermis, Mark D.; Averitt, Jason – Educational Measurement: Issues and Practice, 2002

Outlines a series of security steps that might be taken by researchers or organizations that are contemplating Web-based tests and performance assessments. Focuses on what can be done to avoid the loss, compromising, or modification of data collected by or stored through the Internet. (SLD)

Descriptors: Computer Assisted Testing, Data Collection, Performance Based Assessment, Test Construction

Previous Page | Next Page »

Pages: 1 | 2 | 3 | 4

Lane, Suzanne	3
Wind, Stefanie A.	3
Brennan, Robert L.	2
Glaser, Robert	2
Mehrens, William A.	2
Popham, W. James	2
Arter, Judith A.	1
Arter, Judy	1
Averitt, Jason	1
Bachman, Lyle F.	1
Baxter, Gail P.	1
Berk, Ronald A.	1
Bond, Lloyd	1
Brookhart, Susan M.	1
Brown, Dianne C.	1
Burger, Donald L.	1
Burger, Susan E.	1
Burton, Elizabeth	1
Cai, Jinfa	1
Camara, Wayne J.	1
Cizek, Gregory J.	1
Cohen, Allan	1
Cook, Robert	1
Crisp, Victoria	1
More ▼