ERIC - Search Results

Publication Date

In 2025	0
Since 2024	0
Since 2021 (last 5 years)	2
Since 2016 (last 10 years)	4
Since 2006 (last 20 years)	11

Source

Applied Measurement in…	6
Educational Measurement:…	4
Applied Psychological…	2
Educational Assessment	2
American Educational Research…	1
Educational Researcher	1
Educational and Psychological…	1
Journal of Applied Testing…	1
Journal of Educational…	1
Journal of Psychoeducational…	1
Language Testing	1
Multivariate Behavioral…	1
More ▼

Author

Sireci, Stephen G.	28
Wells, Craig S.	4
Hambleton, Ronald K.	3
Zenisky, April L.	3
Allalouf, Avi	2
Berberoglu, Giray	2
Geisinger, Kurt F.	2
Keller, Lisa A.	2
Swaminathan, Hariharan	2
Baldwin, Su	1
Benítez, Isabel	1
Crotts, Katrina	1
Egan, Karla L.	1
Geisinger, Kurt	1
Gonzalez, Eugenio J.	1
Gökçe, Semirhan	1
Han, Kyung T.	1
Hidalgo Montesinos, María…	1
Huff, Kristen L.	1
Jirka, Stephen	1
Kachchaf, Rachel R.	1
Karantonis, Ana	1
Karatonis, Ana	1
Lewis, Jennifer	1
Li, Xueming	1
More ▼

Publication Type

Journal Articles	22
Reports - Research	14
Reports - Evaluative	8
Speeches/Meeting Papers	7
Reports - Descriptive	3
Opinion Papers	2
Information Analyses	1
Tests/Questionnaires	1

Education Level

Elementary Secondary Education	2
Adult Basic Education	1
Elementary Education	1
Grade 10	1
Grade 5	1
Grade 8	1
Intermediate Grades	1
Middle Schools	1

Audience

Location

Massachusetts	3
California	2
Florida	2
Israel	2
Texas	2
Delaware	1
Kentucky	1
Maryland	1
New York	1
North Carolina	1
Ohio	1
Oklahoma	1
South Carolina	1
Spain	1
Turkey	1
United States	1
Virginia	1
Washington	1
More ▼

Laws, Policies, & Programs

Assessments and Surveys

Trends in International…	2
Medical College Admission Test	1
National Assessment of…	1
Program for International…	1

What Works Clearinghouse Rating

Does not meet standards

Showing 1 to 15 of 28 results Save | Export

Setting and Validating Multiple Standards on a Multistage-Adaptive Test

Peer reviewed

Direct link

Lewis, Jennifer; Lim, Hwanggyu; Padellaro, Frank; Sireci, Stephen G.; Zenisky, April L. – Educational Measurement: Issues and Practice, 2022

Setting cut scores on (MSTs) is difficult, particularly when the test spans several grade levels, and the selection of items from MST panels must reflect the operational test specifications. In this study, we describe, illustrate, and evaluate three methods for mapping panelists' Angoff ratings into cut scores on the scale underlying an MST. The…

Descriptors: Cutting Scores, Adaptive Testing, Test Items, Item Analysis

Targeted Linguistic Simplification of Science Test Items for English Learners

Peer reviewed
PDF on ERIC

Download full text

Direct link

Noble, Tracy; Sireci, Stephen G.; Wells, Craig S.; Kachchaf, Rachel R.; Rosebery, Ann S.; Wang, Yang Caroline – American Educational Research Journal, 2020

In this experimental study, 20 multiple-choice test items from the Massachusetts Grade 5 science test were linguistically simplified, and original and simplified test items were administered to 310 English learners (ELs) and 1,580 non-ELs in four Massachusetts school districts. This study tested the hypothesis that specific linguistic features of…

Descriptors: Science Tests, Language Usage, English Language Learners, School Districts

Using Mixed Methods to Interpret Differential Item Functioning

Peer reviewed

Direct link

Benítez, Isabel; Padilla, José-Luis; Hidalgo Montesinos, María Dolores; Sireci, Stephen G. – Applied Measurement in Education, 2016

Analysis of differential item functioning (DIF) is often used to determine if cross-lingual assessments are equivalent across languages. However, evidence on the causes of cross-lingual DIF is still evasive. Expert appraisal is a qualitative method useful for obtaining detailed information about problematic elements in the different linguistic…

Descriptors: Test Bias, Mixed Methods Research, Questionnaires, International Assessment

Linguistic Distance and Translation Differential Item Functioning on Trends in International Mathematics and Science Study Mathematics Assessment Items

Peer reviewed

Direct link

Gökçe, Semirhan; Berberoglu, Giray; Wells, Craig S.; Sireci, Stephen G. – Journal of Psychoeducational Assessment, 2021

The 2015 Trends in International Mathematics and Science Study (TIMSS) involved 57 countries and 43 different languages to assess students' achievement in mathematics and science. The purpose of this study is to evaluate whether items and test scores are affected as the differences between language families and cultures increase. Using…

Descriptors: Language Classification, Elementary Secondary Education, Mathematics Achievement, Mathematics Tests

The Impact of Multidirectional Item Parameter Drift on IRT Scaling Coefficients and Proficiency Estimates

Peer reviewed

Direct link

Han, Kyung T.; Wells, Craig S.; Sireci, Stephen G. – Applied Measurement in Education, 2012

Item parameter drift (IPD) occurs when item parameter values change from their original value over time. IPD may pose a serious threat to the fairness and validity of test score interpretations, especially when the goal of the assessment is to measure growth or improvement. In this study, we examined the effect of multidirectional IPD (i.e., some…

Descriptors: Item Response Theory, Test Items, Scaling, Methods

A New Method for Analyzing Content Validity Data Using Multidimensional Scaling

Peer reviewed

Direct link

Li, Xueming; Sireci, Stephen G. – Educational and Psychological Measurement, 2013

Validity evidence based on test content is of essential importance in educational testing. One source for such evidence is an alignment study, which helps evaluate the congruence between tested objectives and those specified in the curriculum. However, the results of an alignment study do not always sufficiently capture the degree to which a test…

Descriptors: Content Validity, Multidimensional Scaling, Data Analysis, Educational Testing

Evaluating the Content Validity of Multistage-Adaptive Tests

Peer reviewed

Direct link

Crotts, Katrina; Sireci, Stephen G.; Zenisky, April – Journal of Applied Testing Technology, 2012

Validity evidence based on test content is important for educational tests to demonstrate the degree to which they fulfill their purposes. Most content validity studies involve subject matter experts (SMEs) who rate items that comprise a test form. In computerized-adaptive testing, examinees take different sets of items and test "forms"…

Descriptors: Computer Assisted Testing, Adaptive Testing, Content Validity, Test Content

Evaluating Score Equity Assessment for State NAEP

Peer reviewed

Direct link

Wells, Craig S.; Baldwin, Su; Hambleton, Ronald K.; Sireci, Stephen G.; Karatonis, Ana; Jirka, Stephen – Applied Measurement in Education, 2009

Score equity assessment is an important analysis to ensure inferences drawn from test scores are comparable across subgroups of examinees. The purpose of the present evaluation was to assess the extent to which the Grade 8 NAEP Math and Reading assessments for 2005 were equivalent across selected states. More specifically, the present study…

Descriptors: National Competency Tests, Test Bias, Equated Scores, Grade 8

Validity Issues in Test Speededness

Peer reviewed

Direct link

Lu, Ying; Sireci, Stephen G. – Educational Measurement: Issues and Practice, 2007

Speededness refers to the situation where the time limits on a standardized test do not allow substantial numbers of examinees to fully consider all test items. When tests are not intended to measure speed of responding, speededness introduces a severe threat to the validity of interpretations based on test scores. In this article, we describe…

Descriptors: Test Items, Timed Tests, Standardized Tests, Test Validity

On Validity Theory and Test Validation

Peer reviewed

Direct link

Sireci, Stephen G. – Educational Researcher, 2007

Lissitz and Samuelsen (2007) propose a new framework for conceptualizing test validity that separates analysis of test properties from analysis of the construct measured. In response, the author of this article reviews fundamental characteristics of test validity, drawing largely from seminal writings as well as from the accepted standards. He…

Descriptors: Test Content, Test Validity, Guidelines, Test Items

Using Multidimensional Scaling To Assess the Dimensionality of Dichotomous Item Data.

Peer reviewed

Meara, Kevin; Robin, Frederic; Sireci, Stephen G. – Multivariate Behavioral Research, 2000

Investigated the usefulness of multidimensional scaling (MDS) for assessing the dimensionality of dichotomous test data. Focused on two MDS proximity measures, one based on the PC statistic (T. Chen and M. Davidson, 1996) and other, on interitem Euclidean distances. Simulation results show that both MDS procedures correctly identify…

Descriptors: Correlation, Multidimensional Scaling, Simulation, Test Items

Evaluating Scoring Procedures for Context-Dependent Item Sets.

Peer reviewed

Keller, Lisa A.; Swaminathan, Hariharan; Sireci, Stephen G. – Applied Measurement in Education, 2003

Evaluated two strategies for scoring context-dependent test items: ignoring the depending and scoring dichotomously or modeling the dependence through polytomous scoring. Results for data from 38,965 examinees taking a professional examination show that dichotomous scoring may overestimate test information, but polytomous scoring may underestimate…

Descriptors: Adults, Licensing Examinations (Professions), Scoring, Test Items

The Bookmark Standard-Setting Method: A Literature Review

Peer reviewed

Direct link

Karantonis, Ana; Sireci, Stephen G. – Educational Measurement: Issues and Practice, 2006

The Bookmark method for setting standards on educational tests is currently one of the most popular standard-setting methods. However, research to support the method is scarce. In this report, we review the published and unpublished literature on this method as well as some seminal work in the area of evaluating standard-setting studies. Our…

Descriptors: Academic Standards, Educational Testing, Literature Reviews, Validity

Effect of Item Bundling on the Assessment of Test Dimensionality. Laboratory of Psychometric and Evaluative Research Report No. 328.

Download full text

Egan, Karla L.; Sireci, Stephen G.; Swaminathan, Hariharan; Sweeney, Kevin P. – 1998

The primary purpose of this study was to assess the effect of item bundling on multidimensional data. A second purpose was to compare three methods for assessing dimensionality. Eight multidimensional data sets consisting of 100 items and 1,000 examinees were simulated varying in terms of dimensionality, inter-dimensional correlation, and number…

Descriptors: Certified Public Accountants, Evaluation Methods, Licensing Examinations (Professions), Simulation

An Empirical Evaluation of Selected Multiple-Choice Item Writing Guidelines.

Download full text

Sireci, Stephen G.; Wiley, Andrew; Keller, Lisa A. – 1998

Seven specific guidelines included in the taxonomy proposed by T. Haladyna and S. Downing (1998) for writing multiple-choice test items were evaluated. These specific guidelines are: (1) avoid the complex multiple-choice, K-type format; (2) state the stem in question format; (3) word the stem positively; (4) avoid the phrase "all of the…

Descriptors: Certified Public Accountants, Licensing Examinations (Professions), Multiple Choice Tests, Test Construction

Previous Page | Next Page »

Pages: 1 | 2

Test Items	28
Test Construction	11
Content Validity	7
Multidimensional Scaling	7
Test Content	7
Evaluation Methods	6
Item Analysis	6
Scores	6
Translation	6
Licensing Examinations…	5
Comparative Analysis	4
Foreign Countries	4
Higher Education	4
Item Response Theory	4
Science Tests	4
Test Format	4
Achievement Tests	3
Certified Public Accountants	3
Cluster Analysis	3
College Entrance Examinations	3
Multiple Choice Tests	3
Scoring	3
Simulation	3
Standardized Tests	3
Validity	3
More ▼