NotesFAQContact Us
Collection
Advanced
Search Tips
Showing all 14 results Save | Export
Peer reviewed Peer reviewed
Direct linkDirect link
Kieftenbeld, Vincent; Boyer, Michelle – Applied Measurement in Education, 2017
Automated scoring systems are typically evaluated by comparing the performance of a single automated rater item-by-item to human raters. This presents a challenge when the performance of multiple raters needs to be compared across multiple items. Rankings could depend on specifics of the ranking procedure; observed differences could be due to…
Descriptors: Automation, Scoring, Comparative Analysis, Test Items
Peer reviewed Peer reviewed
Direct linkDirect link
Finch, W. Holmes – Applied Measurement in Education, 2016
Differential item functioning (DIF) assessment is a crucial component in test construction, serving as the primary way in which instrument developers ensure that measures perform in the same way for multiple groups within the population. When such is not the case, scores may not accurately reflect the trait of interest for all individuals in the…
Descriptors: Test Bias, Monte Carlo Methods, Comparative Analysis, Population Groups
Peer reviewed Peer reviewed
Direct linkDirect link
Kang, Hyeon-Ah; Lu, Ying; Chang, Hua-Hua – Applied Measurement in Education, 2017
Increasing use of item pools in large-scale educational assessments calls for an appropriate scaling procedure to achieve a common metric among field-tested items. The present study examines scaling procedures for developing a new item pool under a spiraled block linking design. The three scaling procedures are considered: (a) concurrent…
Descriptors: Item Response Theory, Accuracy, Educational Assessment, Test Items
Peer reviewed Peer reviewed
Direct linkDirect link
Lee, Wooyeol; Cho, Sun-Joo – Applied Measurement in Education, 2017
Utilizing a longitudinal item response model, this study investigated the effect of item parameter drift (IPD) on item parameters and person scores via a Monte Carlo study. Item parameter recovery was investigated for various IPD patterns in terms of bias and root mean-square error (RMSE), and percentage of time the 95% confidence interval covered…
Descriptors: Item Response Theory, Test Items, Bias, Computation
Peer reviewed Peer reviewed
Direct linkDirect link
Steedle, Jeffrey T.; Ferrara, Steve – Applied Measurement in Education, 2016
As an alternative to rubric scoring, comparative judgment generates essay scores by aggregating decisions about the relative quality of the essays. Comparative judgment eliminates certain scorer biases and potentially reduces training requirements, thereby allowing a large number of judges, including teachers, to participate in essay evaluation.…
Descriptors: Essays, Scoring, Comparative Analysis, Evaluators
Peer reviewed Peer reviewed
Direct linkDirect link
Dogan, Enis; Ogut, Burhan; Kim, Young Yee – Applied Measurement in Education, 2015
The relationship between reading skills in earlier grades and achieving "Proficiency" on the National Assessment of Educational Progress (NAEP) grade 8 reading assessment was examined by establishing a statistical link between NAEP and the Early Childhood Longitudinal Study (ECLS) grade 8 reading assessments using data from a common…
Descriptors: Reading Skills, National Competency Tests, Reading Tests, Grade 8
Peer reviewed Peer reviewed
Direct linkDirect link
Davis, Laurie Laughlin; Kong, Xiaojing; McBride, Yuanyuan; Morrison, Kristin M. – Applied Measurement in Education, 2017
The definition of what it means to take a test online continues to evolve with the inclusion of a broader range of item types and a wide array of devices used by students to access test content. To assure the validity and reliability of test scores for all students, device comparability research should be conducted to evaluate the impact of…
Descriptors: Educational Technology, Technology Uses in Education, High School Students, Tests
Peer reviewed Peer reviewed
Direct linkDirect link
Oliveri, Maria Elena; Lawless, Rene; Robin, Frederic; Bridgeman, Brent – Applied Measurement in Education, 2018
We analyzed a pool of items from an admissions test for differential item functioning (DIF) for groups based on age, socioeconomic status, citizenship, or English language status using Mantel-Haenszel and item response theory. DIF items were systematically examined to identify its possible sources by item type, content, and wording. DIF was…
Descriptors: Test Bias, Comparative Analysis, Item Banks, Item Response Theory
Peer reviewed Peer reviewed
Direct linkDirect link
Kong, Xiaojing; Davis, Laurie Laughlin; McBride, Yuanyuan; Morrison, Kristin – Applied Measurement in Education, 2018
Item response time data were used in investigating the differences in student test-taking behavior between two device conditions: computer and tablet. Analyses were conducted to address the questions of whether or not the device condition had a differential impact on rapid guessing and solution behaviors (with response time effort used as an…
Descriptors: Educational Technology, Technology Uses in Education, Computers, Handheld Devices
Peer reviewed Peer reviewed
Direct linkDirect link
Buzick, Heather; Oliveri, Maria Elena; Attali, Yigal; Flor, Michael – Applied Measurement in Education, 2016
Automated essay scoring is a developing technology that can provide efficient scoring of large numbers of written responses. Its use in higher education admissions testing provides an opportunity to collect validity and fairness evidence to support current uses and inform its emergence in other areas such as K-12 large-scale assessment. In this…
Descriptors: Essays, Learning Disabilities, Attention Deficit Hyperactivity Disorder, Scoring
Peer reviewed Peer reviewed
Direct linkDirect link
Briggs, Derek C. – Applied Measurement in Education, 2008
This article illustrates the use of an explanatory item response modeling (EIRM) approach in the context of measuring group differences in science achievement. The distinction between item response models and EIRMs, recently elaborated by De Boeck and Wilson (2004), is presented within the statistical framework of generalized linear mixed models.…
Descriptors: Science Achievement, Science Tests, Measurement, Error of Measurement
Peer reviewed Peer reviewed
Direct linkDirect link
Wollack, James A. – Applied Measurement in Education, 2006
Many of the currently available statistical indexes to detect answer copying lack sufficient power at small [alpha] levels or when the amount of copying is relatively small. Furthermore, there is no one index that is uniformly best. Depending on the type or amount of copying, certain indexes are better than others. The purpose of this article was…
Descriptors: Statistical Analysis, Item Analysis, Test Length, Sample Size
Peer reviewed Peer reviewed
Direct linkDirect link
Porter, Andrew C.; Smithson, John; Blank, Rolf; Zeidner, Timothy – Applied Measurement in Education, 2007
With the exception of the procedures developed by Porter and colleagues (Porter, 2002), other methods of defining and measuring alignment are essentially limited to alignment between tests and standards. Porter's procedures have been generalized to investigating the alignment between content standards, tests, textbooks, and even classroom…
Descriptors: Teaching Methods, Computer Uses in Education, Instructional Innovation, Guidance Programs
Peer reviewed Peer reviewed
Plake, Barbara S. – Applied Measurement in Education, 1995
This article provides a framework for the rest of the articles in this special issue comparing the utility of three standard-setting methods with complex performance assessments. The context of the standard setting study is described, and the methods are outlined. (SLD)
Descriptors: Comparative Analysis, Criteria, Decision Making, Educational Assessment