ERIC - Search Results

Publication Date

In 2025	0
Since 2024	0
Since 2021 (last 5 years)	0
Since 2016 (last 10 years)	4
Since 2006 (last 20 years)	8

Descriptor

Evaluation Methods	19
Test Validity	8
Validity	8
Scoring	5
Student Evaluation	5
Construct Validity	4
Interrater Reliability	4
Measurement Techniques	4
Models	4
Scores	4
Elementary School Students	3
Grade 7	3
Item Response Theory	3
Test Construction	3
Test Reliability	3
Test Scoring Machines	3
Academic Achievement	2
Automation	2
Content Validity	2
Cutting Scores	2
Data Analysis	2
Decision Making	2
Grade 3	2
Grade 5	2
High School Students	2
More ▼

Source

Applied Measurement in…

Publication Type

Journal Articles	19
Reports - Evaluative	9
Reports - Research	9
Information Analyses	1
Reports - General	1

Education Level

High Schools	2
Elementary Education	1
Grade 12	1
Grade 7	1
Middle Schools	1

Audience

Location

Arizona	1
Australia	1
Massachusetts	1
North Carolina	1
Virginia	1

Laws, Policies, & Programs

Assessments and Surveys

What Works Clearinghouse Rating

Showing 1 to 15 of 19 results Save | Export

Validating Rubric Scoring Processes: An Application of an Item Response Tree Model

Peer reviewed

Direct link

Myers, Aaron J.; Ames, Allison J.; Leventhal, Brian C.; Holzman, Madison A. – Applied Measurement in Education, 2020

When rating performance assessments, raters may ascribe different scores for the same performance when rubric application does not align with the intended application of the scoring criteria. Given performance assessment score interpretation assumes raters apply rubrics as rubric developers intended, misalignment between raters' scoring processes…

Descriptors: Scoring Rubrics, Validity, Item Response Theory, Interrater Reliability

Appraising the Scoring Performance of Automated Essay Scoring Systems--Some Additional Considerations: Which Essays? Which Human Raters? Which Scores?

Peer reviewed

Direct link

Raczynski, Kevin; Cohen, Allan – Applied Measurement in Education, 2018

The literature on Automated Essay Scoring (AES) systems has provided useful validation frameworks for any assessment that includes AES scoring. Furthermore, evidence for the scoring fidelity of AES systems is accumulating. Yet questions remain when appraising the scoring performance of AES systems. These questions include: (a) which essays are…

Descriptors: Essay Tests, Test Scoring Machines, Test Validity, Evaluators

In Search of Validity Evidence in Support of the Interpretation and Use of Assessments of Complex Constructs: Discussion of Research on Assessing 21st Century Skills

Peer reviewed

Direct link

Ercikan, Kadriye; Oliveri, María Elena – Applied Measurement in Education, 2016

Assessing complex constructs such as those discussed under the umbrella of 21st century constructs highlights the need for a principled assessment design and validation approach. In our discussion, we made a case for three considerations: (a) taking construct complexity into account across various stages of assessment development such as the…

Descriptors: Evaluation Methods, Test Construction, Design, Scaling

Applying a Thurstonian, Two-Stage Method in the Standardized Assessment of Writing

Peer reviewed

Direct link

McGrane, Joshua Aaron; Humphry, Stephen Mark; Heldsinger, Sandra – Applied Measurement in Education, 2018

National standardized assessment programs have increasingly included extended written performances, amplifying the need for reliable, valid, and efficient methods of assessment. This article examines a two-stage method using comparative judgments and calibrated exemplars as a complement and alternative to existing methods of assessing writing.…

Descriptors: Standardized Tests, Foreign Countries, Writing Tests, Writing Evaluation

Validating Measurement of Knowledge Integration in Science Using Multiple-Choice and Explanation Items

Peer reviewed

Direct link

Lee, Hee-Sun; Liu, Ou Lydia; Linn, Marcia C. – Applied Measurement in Education, 2011

This study explores measurement of a construct called knowledge integration in science using multiple-choice and explanation items. We use construct and instructional validity evidence to examine the role multiple-choice and explanation items plays in measuring students' knowledge integration ability. For construct validity, we analyze item…

Descriptors: Knowledge Level, Construct Validity, Validity, Scaffolding (Teaching Technique)

Impact of Design Effects in Large-Scale District and State Assessments

Peer reviewed

Direct link

Phillips, Gary W. – Applied Measurement in Education, 2015

This article proposes that sampling design effects have potentially huge unrecognized impacts on the results reported by large-scale district and state assessments in the United States. When design effects are unrecognized and unaccounted for they lead to underestimating the sampling error in item and test statistics. Underestimating the sampling…

Descriptors: State Programs, Sampling, Research Design, Error of Measurement

Evaluation of the Standard Setting on the 2005 Grade 12 National Assessment of Educational Progress Mathematics Test

Peer reviewed

Direct link

Sireci, Stephen G.; Hauger, Jeffrey B.; Wells, Craig S.; Shea, Christine; Zenisky, April L. – Applied Measurement in Education, 2009

The National Assessment Governing Board used a new method to set achievement level standards on the 2005 Grade 12 NAEP Math test. In this article, we summarize our independent evaluation of the process used to set these standards. The evaluation data included observations of the standard-setting meeting, observations of advisory committee meetings…

Descriptors: Advisory Committees, Mathematics Tests, Standard Setting, National Competency Tests

How to Assign Individualized Scores on a Group Project: An Empirical Evaluation

Peer reviewed

Direct link

Zhang, Bo; Ohland, Matthew W. – Applied Measurement in Education, 2009

One major challenge in using group projects to assess student learning is accounting for the differences of contribution among group members so that the mark assigned to each individual actually reflects their performance. This research addresses the validity of grading group projects by evaluating different methods that derive individualized…

Descriptors: Monte Carlo Methods, Validity, Student Evaluation, Evaluation Methods

Methodological Approaches to the Validation of Academic Self-Concept: The Construct and Its Measures.

Peer reviewed

Byrne, Barbara M. – Applied Measurement in Education, 1990

Methodological procedures used in validating the theoretical structure of academic self-concept and validating associated measurement instruments are reviewed. Substantive findings from research related to modes of inquiry are summarized, and recommendations for future research are outlined. (TJH)

Descriptors: Classification, Construct Validity, Evaluation Methods, Literature Reviews

A Refined Item Digraph Analysis of a Proportional Reasoning Test.

Peer reviewed

Bart, William M.; Williams-Morris, Ruth – Applied Measurement in Education, 1990

Refined item digraph analysis (RIDA) is a way of studying diagnostic and prescriptive testing. It permits assessment of a test item's diagnostic value by examining the extent to which the item has properties of ideal items. RIDA is illustrated with the Orange Juice Test, which assesses the proportionality concept. (TJH)

Descriptors: Diagnostic Tests, Evaluation Methods, Item Analysis, Mathematical Models

Automated Tools for Subject Matter Expert Evaluation of Automated Scoring

Peer reviewed

Direct link

Williamson, David M.; Bejar, Isaac I.; Sax, Anne – Applied Measurement in Education, 2004

As automated scoring of complex constructed-response examinations reaches operational status, the process of evaluating the quality of resultant scores, particularly in contrast to scores of expert human graders, becomes as complex as the data itself. Using a vignette from the Architectural Registration Examination (ARE), this article explores the…

Descriptors: Validity, Scoring, Scores, Evaluation Methods

Improving Content Validation Studies Using an Asymmetric Confidence Interval for the Mean of Expert Ratings

Peer reviewed

Direct link

Penfield, Randall D.; Miller, Jeffrey M. – Applied Measurement in Education, 2004

Descriptors: Student Evaluation, Evaluation Methods, Content Validity, Scoring

Score Resolution: An Investigation of the Reliability and Validity of Resolved Scores

Peer reviewed

Direct link

Johnson, Robert L.; Penny, Jim; Fisher, Steve; Kuhs, Therese – Applied Measurement in Education, 2003

When raters assign different scores to a performance task, a method for resolving rating differences is required to report a single score to the examinee. Recent studies indicate that decisions about examinees, such as pass/fail decisions, differ across resolution methods. Previous studies also investigated the interrater reliability of…

Descriptors: Test Reliability, Test Validity, Scores, Interrater Reliability

Using Multidimensional Item Response Theory to Understand What Items and Tests Are Measuring.

Peer reviewed

Ackerman, Terry A. – Applied Measurement in Education, 1994

When item response data do not satisfy the unidimensionality assumption, multidimensional item response theory (MIRT) should be used to model the item-examinee interaction. This article presents and discusses MIRT analyses designed to give better insight into what individual items are measuring. (SLD)

Descriptors: Evaluation Methods, Item Response Theory, Measurement Techniques, Models

Assessing Content Representativeness of Performance Assessment Exercises.

Peer reviewed

Crocker, Linda – Applied Measurement in Education, 1997

The experience of the National Board for Professional Teaching Standards illustrates how issues of assessing the content representativeness of performance assessment can be addressed to ensure validity for certification procedures. Explores the challenges of collecting validation evidence when expert judgments of content are used. (SLD)

Descriptors: Content Validity, Credentials, Data Collection, Evaluation Methods

Previous Page | Next Page »

Pages: 1 | 2

Byrne, Barbara M.	2
Ackerman, Terry A.	1
Ames, Allison J.	1
Bart, William M.	1
Bazana, P. Gordon	1
Bejar, Isaac I.	1
Calfee, Robert	1
Cohen, Allan	1
Crocker, Linda	1
Ercikan, Kadriye	1
Fisher, Steve	1
Forsyth, Robert A.	1
Hauger, Jeffrey B.	1
Heldsinger, Sandra	1
Holzman, Madison A.	1
Humphry, Stephen Mark	1
Johnson, Robert L.	1
Kuhs, Therese	1
Lee, Hee-Sun	1
Leventhal, Brian C.	1
Linn, Marcia C.	1
Liu, Ou Lydia	1
McGrane, Joshua Aaron	1
Mehrens, William A.	1
More ▼