NotesFAQContact Us
Collection
Advanced
Search Tips
Showing 1 to 15 of 22 results Save | Export
Peer reviewed Peer reviewed
Direct linkDirect link
DeCarlo, Lawrence T.; Zhou, Xiaoliang – Journal of Educational Measurement, 2021
In signal detection rater models for constructed response (CR) scoring, it is assumed that raters discriminate equally well between different latent classes defined by the scoring rubric. An extended model that relaxes this assumption is introduced; the model recognizes that a rater may not discriminate equally well between some of the scoring…
Descriptors: Scoring, Models, Bias, Perception
Peer reviewed Peer reviewed
Direct linkDirect link
Jin, Kuan-Yu; Wang, Wen-Chung – Journal of Educational Measurement, 2018
The Rasch facets model was developed to account for facet data, such as student essays graded by raters, but it accounts for only one kind of rater effect (severity). In practice, raters may exhibit various tendencies such as using middle or extreme scores in their ratings, which is referred to as the rater centrality/extremity response style. To…
Descriptors: Scoring, Models, Interrater Reliability, Computation
Peer reviewed Peer reviewed
Direct linkDirect link
Nieto, Ricardo; Casabianca, Jodi M. – Journal of Educational Measurement, 2019
Many large-scale assessments are designed to yield two or more scores for an individual by administering multiple sections measuring different but related skills. Multidimensional tests, or more specifically, simple structured tests, such as these rely on multiple multiple-choice and/or constructed responses sections of items to generate multiple…
Descriptors: Tests, Scoring, Responses, Test Items
Peer reviewed Peer reviewed
Direct linkDirect link
Leckie, George; Baird, Jo-Anne – Journal of Educational Measurement, 2011
This study examined rater effects on essay scoring in an operational monitoring system from England's 2008 national curriculum English writing test for 14-year-olds. We fitted two multilevel models and analyzed: (1) drift in rater severity effects over time; (2) rater central tendency effects; and (3) differences in rater severity and central…
Descriptors: Scoring, Foreign Countries, National Curriculum, Writing Tests
Peer reviewed Peer reviewed
Direct linkDirect link
Suh, Youngsuk; Cho, Sun-Joo; Wollack, James A. – Journal of Educational Measurement, 2012
In the presence of test speededness, the parameter estimates of item response theory models can be poorly estimated due to conditional dependencies among items, particularly for end-of-test items (i.e., speeded items). This article conducted a systematic comparison of five-item calibration procedures--a two-parameter logistic (2PL) model, a…
Descriptors: Response Style (Tests), Timed Tests, Test Items, Item Response Theory
Peer reviewed Peer reviewed
Ackerman, Terry A. – Journal of Educational Measurement, 1992
The difference between item bias and item impact and the way they relate to item validity are discussed from a multidimensional item response theory perspective. The Mantel-Haenszel procedure and the Simultaneous Item Bias strategy are used in a Monte Carlo study to illustrate detection of item bias. (SLD)
Descriptors: Causal Models, Computer Simulation, Construct Validity, Equations (Mathematics)
Peer reviewed Peer reviewed
Williamson, David M.; Bejar, Isaac I.; Hone, Anne S. – Journal of Educational Measurement, 1999
Contrasts "mental models" used by automated scoring for the simulation division of the computerized Architect Registration Examination with those used by experienced human graders for 3,613 candidate solutions. Discusses differences in the models used and the potential of automated scoring to enhance the validity evidence of scores. (SLD)
Descriptors: Architects, Comparative Analysis, Computer Assisted Testing, Judges
Peer reviewed Peer reviewed
Hughes, David C.; Keeling, Brian – Journal of Educational Measurement, 1984
Several studies have shown that essays receive higher marks when preceded by poor quality scripts than when preceded by good quality scripts. This study investigated the effectiveness of providing scorers with model essays to reduce the influence of context. Context effects persisted despite the scoring procedures used. (Author/EGS)
Descriptors: Context Effect, Essay Tests, Essays, High Schools
Peer reviewed Peer reviewed
Thissen, David; And Others – Journal of Educational Measurement, 1989
An approach to scoring reading comprehension based on the concept of the testlet is described, using models developed for items in multiple categories. The model is illustrated using data from 3,866 examinees. Application of testlet scoring to multiple category models developed for individual items is discussed. (SLD)
Descriptors: Adaptive Testing, Computer Assisted Testing, Item Response Theory, Mathematical Models
Peer reviewed Peer reviewed
Gruijter, Dato N. M. – Journal of Educational Measurement, 1985
To improve on cutoff scores based on absolute standards which may produce an unacceptable number of failures, a compromise is suggested. The compromise draws on the information in the observed score distribution to adjust the standard. Three compromise models developed by Hofstee, Beuk, and De Gruijter are compared. (Author/GDC)
Descriptors: Academic Standards, Comparative Testing, Cutting Scores, Mastery Tests
Peer reviewed Peer reviewed
Baskin, David – Journal of Educational Measurement, 1975
Traditional test scoring does not allow the examination of differences among subjects obtaining identical raw scores on the same test. A configuration scoring paradigm for identical raw scores, which provides for such comparisons, is developed and illustrated. (Author)
Descriptors: Elementary Secondary Education, Individual Differences, Mathematical Models, Multiple Choice Tests
Peer reviewed Peer reviewed
Smith, Richard M. – Journal of Educational Measurement, 1987
Partial knowledge was assessed in a multiple choice vocabulary test. Test reliability and concurrent validity were compared using Rasch-based dichotomous and polychotomous scoring models. Results supported the polychtomous scoring model, and moderately supported J. O'Connor's theory of vocabulary acquisition. (Author/GDC)
Descriptors: Adults, Higher Education, Knowledge Level, Latent Trait Theory
Peer reviewed Peer reviewed
Roberts, Dennis M. – Journal of Educational Measurement, 1987
This study examines a score-difference model for the detection of cheating based on the difference between two scores for an examinee: one based on the appropriate scoring key and another based on an alternative, inappropriate key. It argues that the score-difference method could falsely accuse students as cheaters. (Author/JAZ)
Descriptors: Answer Keys, Cheating, Mathematical Models, Multiple Choice Tests
Peer reviewed Peer reviewed
Masters, Geofferey N. – Journal of Educational Measurement, 1984
This paper develops and illustrates a latent trait approach to constructing an item bank when responses are scored in several ordered categories. This approach is an extension of the methodology developed by Choppin, Wright and Stone, and Wright and Bell for the construction and maintenance of banks of dichotomously scored items. (Author/PN)
Descriptors: Equated Scores, Item Banks, Latent Trait Theory, Mathematical Models
Peer reviewed Peer reviewed
Thissen, David M. – Journal of Educational Measurement, 1976
Where estimation of abilities in the lower half of the ability distribution for the Raven Progressive Matrices is important, or an increase in accuracy of ability estimation is needed, the multiple category latent trait estimation provides a rational procedure for realizing gains in accuracy from the use of information in wrong responses.…
Descriptors: Intelligence Tests, Item Analysis, Junior High Schools, Mathematical Models
Previous Page | Next Page ยป
Pages: 1  |  2