Publication Date
In 2025 | 4 |
Since 2024 | 9 |
Since 2021 (last 5 years) | 58 |
Since 2016 (last 10 years) | 147 |
Since 2006 (last 20 years) | 496 |
Descriptor
Equated Scores | 1113 |
Test Items | 298 |
Item Response Theory | 297 |
Comparative Analysis | 247 |
Statistical Analysis | 233 |
Test Construction | 165 |
Error of Measurement | 143 |
Test Format | 135 |
Scaling | 129 |
College Entrance Examinations | 124 |
Difficulty Level | 119 |
More ▼ |
Source
Author
Bianchini, John C. | 35 |
von Davier, Alina A. | 34 |
Dorans, Neil J. | 33 |
Kolen, Michael J. | 31 |
Loret, Peter G. | 31 |
Kim, Sooyeon | 26 |
Moses, Tim | 24 |
Livingston, Samuel A. | 22 |
Holland, Paul W. | 20 |
Puhan, Gautam | 20 |
Liu, Jinghua | 19 |
More ▼ |
Publication Type
Education Level
Location
Canada | 9 |
Australia | 8 |
Florida | 8 |
United Kingdom (England) | 8 |
Netherlands | 7 |
New York | 7 |
United States | 7 |
Israel | 6 |
Turkey | 6 |
United Kingdom | 6 |
California | 5 |
More ▼ |
Laws, Policies, & Programs
Elementary and Secondary… | 12 |
No Child Left Behind Act 2001 | 5 |
Education Consolidation… | 3 |
Hawkins Stafford Act 1988 | 1 |
Race to the Top | 1 |
Assessments and Surveys
What Works Clearinghouse Rating
Meets WWC Standards without Reservations | 1 |
Meets WWC Standards with or without Reservations | 1 |
Wells, Craig S.; Hambleton, Ronald K.; Kirkpatrick, Robert; Meng, Yu – Applied Measurement in Education, 2014
The purpose of the present study was to develop and evaluate two procedures flagging consequential item parameter drift (IPD) in an operational testing program. The first procedure was based on flagging items that exhibit a meaningful magnitude of IPD using a critical value that was defined to represent barely tolerable IPD. The second procedure…
Descriptors: Test Items, Test Bias, Equated Scores, Item Response Theory
Wolkowitz, Amanda; Davis-Becker, Susan – Practical Assessment, Research & Evaluation, 2015
This study evaluates the impact of common item characteristics on the outcome of equating in credentialing examinations when traditionally recommended representation is not possible. This research used real data sets from several credentialing exams to test the impact of content representation, item statistics, and number of common items on…
Descriptors: Test Items, Equated Scores, Licensing Examinations (Professions), Test Content
Koretz, Daniel – Measurement: Interdisciplinary Research and Perspectives, 2015
Accountability has become a primary function of large-scale testing in the United States. The pressure on educators to raise scores is vastly greater than it was several decades ago. Research has shown that high-stakes testing can generate behavioral responses that inflate scores, often severely. I argue that because of these responses, using…
Descriptors: Accountability, Educational Testing, Test Construction, Test Validity
Longford, Nicholas T. – Journal of Educational and Behavioral Statistics, 2015
An equating procedure for a testing program with evolving distribution of examinee profiles is developed. No anchor is available because the original scoring scheme was based on expert judgment of the item difficulties. Pairs of examinees from two administrations are formed by matching on coarsened propensity scores derived from a set of…
Descriptors: Equated Scores, Testing Programs, College Entrance Examinations, Scoring
Huggins-Manley, Anne Corinne – Educational and Psychological Measurement, 2017
This study defines subpopulation item parameter drift (SIPD) as a change in item parameters over time that is dependent on subpopulations of examinees, and hypothesizes that the presence of SIPD in anchor items is associated with bias and/or lack of invariance in three psychometric outcomes. Results show that SIPD in anchor items is associated…
Descriptors: Psychometrics, Test Items, Item Response Theory, Hypothesis Testing
Liu, Jinghua; Guo, Hongwen; Dorans, Neil J. – ETS Research Report Series, 2014
Maintaining score interchangeability and scale consistency is crucial for any testing programs that administer multiple forms across years. The use of a multiple linking design, which involves equating a new form to multiple old forms and averaging the conversions, has been proposed to control scale drift. However, the use of multiple linking…
Descriptors: Comparative Analysis, Reliability, Test Construction, Equated Scores
Wiberg, Marie; van der Linden, Wim J.; von Davier, Alina A. – Journal of Educational Measurement, 2014
Three local observed-score kernel equating methods that integrate methods from the local equating and kernel equating frameworks are proposed. The new methods were compared with their earlier counterparts with respect to such measures as bias--as defined by Lord's criterion of equity--and percent relative error. The local kernel item response…
Descriptors: Measurement Techniques, Evaluation Methods, Item Response Theory, Equated Scores
Powers, Sonya; Kolen, Michael J. – Journal of Educational Measurement, 2014
Accurate equating results are essential when comparing examinee scores across exam forms. Previous research indicates that equating results may not be accurate when group differences are large. This study compared the equating results of frequency estimation, chained equipercentile, item response theory (IRT) true-score, and IRT observed-score…
Descriptors: Accuracy, Equated Scores, Differences, Groups
Keast, Dan; Tapper, Larke – Journal of Educators Online, 2016
The researchers of this study investigated the participants' (N = 177) use of a self-evaluation tool employed at the end of an online undergraduate music course that fulfilled the Texas general education requirement for the creative arts. Participants' use of the two aspects of the tool correlated at r = 0.5548--interpreted as a high positive…
Descriptors: Music Education, Self Evaluation (Individuals), Majors (Students), Online Courses
Lee, Guemin; Lee, Won-Chan – Applied Measurement in Education, 2016
The main purposes of this study were to develop bi-factor multidimensional item response theory (BF-MIRT) observed-score equating procedures for mixed-format tests and to investigate relative appropriateness of the proposed procedures. Using data from a large-scale testing program, three types of pseudo data sets were formulated: matched samples,…
Descriptors: Test Format, Multidimensional Scaling, Item Response Theory, Equated Scores
Preston, Kathleen Suzanne Johnson; Parral, Skye N.; Gottfried, Allen W.; Oliver, Pamella H.; Gottfried, Adele Eskeles; Ibrahim, Sirena M.; Delany, Danielle – Educational and Psychological Measurement, 2015
A psychometric analysis was conducted using the nominal response model under the item response theory framework to construct the Positive Family Relationships scale. Using data from the Fullerton Longitudinal Study, this scale was constructed within a long-term longitudinal framework spanning middle childhood through adolescence. Items tapping…
Descriptors: Family Relationship, Measures (Individuals), Psychometrics, Models
Han, Kyung T.; Wells, Craig S.; Hambleton, Ronald K. – Practical Assessment, Research & Evaluation, 2015
In item response theory test scaling/equating with the three-parameter model, the scaling coefficients A and B have no impact on the c-parameter estimates of the test items since the cparameter estimates are not adjusted in the scaling/equating procedure. The main research question in this study concerned how serious the consequences would be if…
Descriptors: Item Response Theory, Monte Carlo Methods, Scaling, Test Items
Barr, Christopher D.; Reutebuch, Colleen K.; Carlson, Coleen D.; Vaughn, Sharon; Francis, David J. – Journal of Research on Educational Effectiveness, 2019
Beginning in 2002, researchers developed, implemented, and evaluated the efficacy of an English reading intervention for first-grade English learners using multiple randomized control trials (RCTs). As a result of this efficacy work, researchers successfully competed for an IES Goal 4 effectiveness study using the same intervention. Unlike the…
Descriptors: Intervention, English Language Learners, Grade 1, Elementary School Students
Winters, Marcus A. – Manhattan Institute for Policy Research, 2017
Critics of charter schools in New York City, America's largest school district, often allege that charters score better on standardized tests, on average, than traditional public schools because charters "cream-skim" (i.e., attract) the brightest, most motivated, students. Yet this accusation neglects the fact that not all traditional…
Descriptors: Charter Schools, Public Schools, School Effectiveness, Success
Walstad, William B.; Miller, Laurie A. – Journal of Economic Education, 2016
Survey results from a national sample of economics instructors describe the grading policies and practices in principles of economics courses. The survey results provide insights about absolute and relative grading systems used by instructors, the course components and their weights that determine grades, and the type of assessment items used for…
Descriptors: Grades (Scholastic), Grading, Economics Education, Educational Policy