NotesFAQContact Us
Collection
Advanced
Search Tips
Audience
Laws, Policies, & Programs
What Works Clearinghouse Rating
Does not meet standards1
Showing 1 to 15 of 28 results Save | Export
Peer reviewed Peer reviewed
Direct linkDirect link
Lewis, Jennifer; Lim, Hwanggyu; Padellaro, Frank; Sireci, Stephen G.; Zenisky, April L. – Educational Measurement: Issues and Practice, 2022
Setting cut scores on (MSTs) is difficult, particularly when the test spans several grade levels, and the selection of items from MST panels must reflect the operational test specifications. In this study, we describe, illustrate, and evaluate three methods for mapping panelists' Angoff ratings into cut scores on the scale underlying an MST. The…
Descriptors: Cutting Scores, Adaptive Testing, Test Items, Item Analysis
Noble, Tracy; Sireci, Stephen G.; Wells, Craig S.; Kachchaf, Rachel R.; Rosebery, Ann S.; Wang, Yang Caroline – American Educational Research Journal, 2020
In this experimental study, 20 multiple-choice test items from the Massachusetts Grade 5 science test were linguistically simplified, and original and simplified test items were administered to 310 English learners (ELs) and 1,580 non-ELs in four Massachusetts school districts. This study tested the hypothesis that specific linguistic features of…
Descriptors: Science Tests, Language Usage, English Language Learners, School Districts
Peer reviewed Peer reviewed
Direct linkDirect link
Benítez, Isabel; Padilla, José-Luis; Hidalgo Montesinos, María Dolores; Sireci, Stephen G. – Applied Measurement in Education, 2016
Analysis of differential item functioning (DIF) is often used to determine if cross-lingual assessments are equivalent across languages. However, evidence on the causes of cross-lingual DIF is still evasive. Expert appraisal is a qualitative method useful for obtaining detailed information about problematic elements in the different linguistic…
Descriptors: Test Bias, Mixed Methods Research, Questionnaires, International Assessment
Peer reviewed Peer reviewed
Direct linkDirect link
Gökçe, Semirhan; Berberoglu, Giray; Wells, Craig S.; Sireci, Stephen G. – Journal of Psychoeducational Assessment, 2021
The 2015 Trends in International Mathematics and Science Study (TIMSS) involved 57 countries and 43 different languages to assess students' achievement in mathematics and science. The purpose of this study is to evaluate whether items and test scores are affected as the differences between language families and cultures increase. Using…
Descriptors: Language Classification, Elementary Secondary Education, Mathematics Achievement, Mathematics Tests
Peer reviewed Peer reviewed
Direct linkDirect link
Han, Kyung T.; Wells, Craig S.; Sireci, Stephen G. – Applied Measurement in Education, 2012
Item parameter drift (IPD) occurs when item parameter values change from their original value over time. IPD may pose a serious threat to the fairness and validity of test score interpretations, especially when the goal of the assessment is to measure growth or improvement. In this study, we examined the effect of multidirectional IPD (i.e., some…
Descriptors: Item Response Theory, Test Items, Scaling, Methods
Peer reviewed Peer reviewed
Direct linkDirect link
Li, Xueming; Sireci, Stephen G. – Educational and Psychological Measurement, 2013
Validity evidence based on test content is of essential importance in educational testing. One source for such evidence is an alignment study, which helps evaluate the congruence between tested objectives and those specified in the curriculum. However, the results of an alignment study do not always sufficiently capture the degree to which a test…
Descriptors: Content Validity, Multidimensional Scaling, Data Analysis, Educational Testing
Peer reviewed Peer reviewed
Direct linkDirect link
Crotts, Katrina; Sireci, Stephen G.; Zenisky, April – Journal of Applied Testing Technology, 2012
Validity evidence based on test content is important for educational tests to demonstrate the degree to which they fulfill their purposes. Most content validity studies involve subject matter experts (SMEs) who rate items that comprise a test form. In computerized-adaptive testing, examinees take different sets of items and test "forms"…
Descriptors: Computer Assisted Testing, Adaptive Testing, Content Validity, Test Content
Peer reviewed Peer reviewed
Direct linkDirect link
Wells, Craig S.; Baldwin, Su; Hambleton, Ronald K.; Sireci, Stephen G.; Karatonis, Ana; Jirka, Stephen – Applied Measurement in Education, 2009
Score equity assessment is an important analysis to ensure inferences drawn from test scores are comparable across subgroups of examinees. The purpose of the present evaluation was to assess the extent to which the Grade 8 NAEP Math and Reading assessments for 2005 were equivalent across selected states. More specifically, the present study…
Descriptors: National Competency Tests, Test Bias, Equated Scores, Grade 8
Peer reviewed Peer reviewed
Direct linkDirect link
Lu, Ying; Sireci, Stephen G. – Educational Measurement: Issues and Practice, 2007
Speededness refers to the situation where the time limits on a standardized test do not allow substantial numbers of examinees to fully consider all test items. When tests are not intended to measure speed of responding, speededness introduces a severe threat to the validity of interpretations based on test scores. In this article, we describe…
Descriptors: Test Items, Timed Tests, Standardized Tests, Test Validity
Peer reviewed Peer reviewed
Direct linkDirect link
Sireci, Stephen G. – Educational Researcher, 2007
Lissitz and Samuelsen (2007) propose a new framework for conceptualizing test validity that separates analysis of test properties from analysis of the construct measured. In response, the author of this article reviews fundamental characteristics of test validity, drawing largely from seminal writings as well as from the accepted standards. He…
Descriptors: Test Content, Test Validity, Guidelines, Test Items
Peer reviewed Peer reviewed
Meara, Kevin; Robin, Frederic; Sireci, Stephen G. – Multivariate Behavioral Research, 2000
Investigated the usefulness of multidimensional scaling (MDS) for assessing the dimensionality of dichotomous test data. Focused on two MDS proximity measures, one based on the PC statistic (T. Chen and M. Davidson, 1996) and other, on interitem Euclidean distances. Simulation results show that both MDS procedures correctly identify…
Descriptors: Correlation, Multidimensional Scaling, Simulation, Test Items
Peer reviewed Peer reviewed
Keller, Lisa A.; Swaminathan, Hariharan; Sireci, Stephen G. – Applied Measurement in Education, 2003
Evaluated two strategies for scoring context-dependent test items: ignoring the depending and scoring dichotomously or modeling the dependence through polytomous scoring. Results for data from 38,965 examinees taking a professional examination show that dichotomous scoring may overestimate test information, but polytomous scoring may underestimate…
Descriptors: Adults, Licensing Examinations (Professions), Scoring, Test Items
Peer reviewed Peer reviewed
Direct linkDirect link
Karantonis, Ana; Sireci, Stephen G. – Educational Measurement: Issues and Practice, 2006
The Bookmark method for setting standards on educational tests is currently one of the most popular standard-setting methods. However, research to support the method is scarce. In this report, we review the published and unpublished literature on this method as well as some seminal work in the area of evaluating standard-setting studies. Our…
Descriptors: Academic Standards, Educational Testing, Literature Reviews, Validity
Egan, Karla L.; Sireci, Stephen G.; Swaminathan, Hariharan; Sweeney, Kevin P. – 1998
The primary purpose of this study was to assess the effect of item bundling on multidimensional data. A second purpose was to compare three methods for assessing dimensionality. Eight multidimensional data sets consisting of 100 items and 1,000 examinees were simulated varying in terms of dimensionality, inter-dimensional correlation, and number…
Descriptors: Certified Public Accountants, Evaluation Methods, Licensing Examinations (Professions), Simulation
Sireci, Stephen G.; Wiley, Andrew; Keller, Lisa A. – 1998
Seven specific guidelines included in the taxonomy proposed by T. Haladyna and S. Downing (1998) for writing multiple-choice test items were evaluated. These specific guidelines are: (1) avoid the complex multiple-choice, K-type format; (2) state the stem in question format; (3) word the stem positively; (4) avoid the phrase "all of the…
Descriptors: Certified Public Accountants, Licensing Examinations (Professions), Multiple Choice Tests, Test Construction
Previous Page | Next Page »
Pages: 1  |  2