ERIC - Search Results

Publication Date

In 2025	0
Since 2024	0
Since 2021 (last 5 years)	1
Since 2016 (last 10 years)	1
Since 2006 (last 20 years)	5

Descriptor

Evaluation Methods	9
Multidimensional Scaling	4
Test Items	4
Content Validity	3
Test Validity	3
Achievement Tests	2
Construct Validity	2
Cutting Scores	2
Research Methodology	2
Scores	2
Validity	2
Academic Achievement	1
Adaptive Testing	1
Advisory Committees	1
Alignment (Education)	1
Cluster Analysis	1
Cross Cultural Studies	1
Curriculum Development	1
Data Analysis	1
Data Interpretation	1
Educational Assessment	1
Educational Research	1
English (Second Language)	1
Evaluation Research	1
Grade 12	1
More ▼

Source

Educational Assessment	2
Educational Measurement:…	2
Applied Measurement in…	1
Applied Psychological…	1
Review of Educational Research	1

Author

Sireci, Stephen G.	9
Wells, Craig S.	2
Zenisky, April L.	2
Bastari, B.	1
Geisinger, Kurt F.	1
Han, Kyung T.	1
Hauger, Jeffrey B.	1
Lewis, Jennifer	1
Lim, Hwanggyu	1
Lu, Ying	1
Martone, Andrea	1
Padellaro, Frank	1
Shea, Christine	1
More ▼

Publication Type

Journal Articles	7
Reports - Evaluative	5
Speeches/Meeting Papers	3
Reports - Descriptive	2
Reports - Research	2

Education Level

Adult Education	1
Grade 12	1
High Schools	1

Audience

Location

Laws, Policies, & Programs

Assessments and Surveys

What Works Clearinghouse Rating

Showing all 9 results Save | Export

Setting and Validating Multiple Standards on a Multistage-Adaptive Test

Peer reviewed

Direct link

Lewis, Jennifer; Lim, Hwanggyu; Padellaro, Frank; Sireci, Stephen G.; Zenisky, April L. – Educational Measurement: Issues and Practice, 2022

Setting cut scores on (MSTs) is difficult, particularly when the test spans several grade levels, and the selection of items from MST panels must reflect the operational test specifications. In this study, we describe, illustrate, and evaluate three methods for mapping panelists' Angoff ratings into cut scores on the scale underlying an MST. The…

Descriptors: Cutting Scores, Adaptive Testing, Test Items, Item Analysis

Validity Issues in Test Speededness

Peer reviewed

Direct link

Lu, Ying; Sireci, Stephen G. – Educational Measurement: Issues and Practice, 2007

Speededness refers to the situation where the time limits on a standardized test do not allow substantial numbers of examinees to fully consider all test items. When tests are not intended to measure speed of responding, speededness introduces a severe threat to the validity of interpretations based on test scores. In this article, we describe…

Descriptors: Test Items, Timed Tests, Standardized Tests, Test Validity

Evaluation of the Standard Setting on the 2005 Grade 12 National Assessment of Educational Progress Mathematics Test

Peer reviewed

Direct link

Sireci, Stephen G.; Hauger, Jeffrey B.; Wells, Craig S.; Shea, Christine; Zenisky, April L. – Applied Measurement in Education, 2009

The National Assessment Governing Board used a new method to set achievement level standards on the 2005 Grade 12 NAEP Math test. In this article, we summarize our independent evaluation of the process used to set these standards. The evaluation data included observations of the standard-setting meeting, observations of advisory committee meetings…

Descriptors: Advisory Committees, Mathematics Tests, Standard Setting, National Competency Tests

Evaluating Alignment between Curriculum, Assessment, and Instruction

Peer reviewed

Direct link

Martone, Andrea; Sireci, Stephen G. – Review of Educational Research, 2009

The authors (a) discuss the importance of alignment for facilitating proper assessment and instruction, (b) describe the three most common methods for evaluating the alignment between state content standards and assessments, (c) discuss the relative strengths and limitations of these methods, and (d) discuss examples of applications of each…

Descriptors: Teaching Methods, Alignment (Education), Student Evaluation, Curriculum Development

Methods for Evaluating the Validity of Test Scores for English Language Learners

Peer reviewed

Direct link

Sireci, Stephen G.; Han, Kyung T.; Wells, Craig S. – Educational Assessment, 2008

In the United States, when English language learners (ELLs) are tested, they are usually tested in English and their limited English proficiency is a potential cause of construct-irrelevant variance. When such irrelevancies affect test scores, inaccurate interpretations of ELLs' knowledge, skills, and abilities may occur. In this article, we…

Descriptors: Test Use, Educational Assessment, Psychological Testing, Validity

Using Subject-Matter Experts to Assess Content Representation: An MDS Analysis.

Peer reviewed

Sireci, Stephen G.; Geisinger, Kurt F. – Applied Psychological Measurement, 1995

An expanded version of the method of content evaluation proposed by S. G. Sireci and K. F. Giesinger (1992) was evaluated with respect to a national licensure examination and a nationally standardized social studies achievement test. Two groups of 15 subject-matter experts rated the similarity and content relevance of the items. (SLD)

Descriptors: Achievement Tests, Cluster Analysis, Construct Validity, Content Validity

Evaluating Content Validity Using Multidimensional Scaling.

Download full text

Sireci, Stephen G. – 1998

Multidimensional scaling (MDS) is a versatile technique for understanding the structure of multivariate data. Recent studies have applied MDS to the problem of evaluating content validity. This paper describes the importance of evaluating test content and the logic of using MDS to analyze data gathered from subject matter experts employed in…

Descriptors: Content Validity, Evaluation Methods, Multidimensional Scaling, Research Methodology

Gathering and Analyzing Content Validity Data.

Peer reviewed

Sireci, Stephen G. – Educational Assessment, 1998

Describes content-validity theory and illustrates new and traditional approaches for conducting content-validity studies. Newer approaches are based on multidimensional scaling analysis of item-similarity ratings, while traditional approaches are based on ratings of item-objective congruence and relevance. (Author/SLD)

Descriptors: Content Validity, Data Analysis, Evaluation Methods, Multidimensional Scaling

Evaluating Construct Equivalence across Adapted Tests.

Download full text

Sireci, Stephen G.; Bastari, B. – 1998

In many cross-cultural research studies, assessment instruments are translated or adapted for use in multiple languages. However, it cannot be assumed that different language versions of an assessment are equivalent across languages. A fundamental issue to be addressed is the comparability or equivalence of the construct measured by each language…

Descriptors: Construct Validity, Cross Cultural Studies, Evaluation Methods, Multidimensional Scaling