Publication Date
In 2025 | 0 |
Since 2024 | 0 |
Since 2021 (last 5 years) | 0 |
Since 2016 (last 10 years) | 0 |
Since 2006 (last 20 years) | 14 |
Descriptor
Scoring | 50 |
Item Response Theory | 23 |
Test Items | 13 |
Scores | 9 |
Higher Education | 8 |
Estimation (Mathematics) | 7 |
Simulation | 7 |
Computer Assisted Testing | 6 |
Equations (Mathematics) | 6 |
Evaluation Methods | 6 |
Goodness of Fit | 6 |
More ▼ |
Source
Applied Psychological… | 50 |
Author
de la Torre, Jimmy | 4 |
Bennett, Randy Elliot | 2 |
Drasgow, Fritz | 2 |
Hambleton, Ronald K. | 2 |
Kolen, Michael J. | 2 |
Lee, Won-Chan | 2 |
Meijer, Rob R. | 2 |
Reise, Steven P. | 2 |
Andrich, David | 1 |
Baker, Frank B. | 1 |
Baldwin, Peter | 1 |
More ▼ |
Publication Type
Journal Articles | 46 |
Reports - Evaluative | 21 |
Reports - Research | 15 |
Reports - Descriptive | 6 |
Book/Product Reviews | 3 |
Speeches/Meeting Papers | 2 |
Information Analyses | 1 |
Education Level
Grade 8 | 1 |
Audience
Researchers | 2 |
Location
China | 1 |
Hong Kong | 1 |
Macau | 1 |
Netherlands | 1 |
Taiwan (Taipei) | 1 |
Laws, Policies, & Programs
Assessments and Surveys
ACT Assessment | 1 |
Advanced Placement… | 1 |
Center for Epidemiologic… | 1 |
Defining Issues Test | 1 |
National Assessment of… | 1 |
Program for International… | 1 |
Rod and Frame Test | 1 |
United States Medical… | 1 |
What Works Clearinghouse Rating
Liu, Ying; Verkuilen, Jay – Applied Psychological Measurement, 2013
The Presence-Severity (P-S) format refers to a compound item structure in which a question is first asked to check the presence of the particular event in question. If the respondent provides an affirmative answer, a follow-up is administered, often about the frequency, density, severity, or impact of the event. Despite the popularity of the P-S…
Descriptors: Item Response Theory, Measures (Individuals), Psychometrics, Cancer
Sun, Jianan; Xin, Tao; Zhang, Shumei; de la Torre, Jimmy – Applied Psychological Measurement, 2013
This article proposes a generalized distance discriminating method for test with polytomous response (GDD-P). The new method is the polytomous extension of an item response theory (IRT)-based cognitive diagnostic method, which can identify examinees' ideal response patterns (IRPs) based on a generalized distance index. The similarities between…
Descriptors: Item Response Theory, Cognitive Tests, Diagnostic Tests, Matrices
van der Ark, L. Andries; van der Palm, Daniel W.; Sijtsma, Klaas – Applied Psychological Measurement, 2011
This study presents a general framework for single-administration reliability methods, such as Cronbach's alpha, Guttman's lambda-2, and method MS. This general framework was used to derive a new approach to estimating test-score reliability by means of the unrestricted latent class model. This new approach is the latent class reliability…
Descriptors: Simulation, Reliability, Measurement, Psychology
Harik, Polina; Baldwin, Peter; Clauser, Brian – Applied Psychological Measurement, 2013
Growing reliance on complex constructed response items has generated considerable interest in automated scoring solutions. Many of these solutions are described in the literature; however, relatively few studies have been published that "compare" automated scoring strategies. Here, comparisons are made among five strategies for…
Descriptors: Computer Assisted Testing, Automation, Scoring, Comparative Analysis
de la Torre, Jimmy; Song, Hao; Hong, Yuan – Applied Psychological Measurement, 2011
Lack of sufficient reliability is the primary impediment for generating and reporting subtest scores. Several current methods of subscore estimation do so either by incorporating the correlational structure among the subtest abilities or by using the examinee's performance on the overall test. This article conducted a systematic comparison of four…
Descriptors: Item Response Theory, Scoring, Methods, Comparative Analysis
Fidalgo, Angel M.; Bartram, Dave – Applied Psychological Measurement, 2010
The main objective of this study was to establish the relative efficacy of the generalized Mantel-Haenszel test (GMH) and the Mantel test for detecting large numbers of differential item functioning (DIF) patterns. To this end this study considered a topic not dealt with in the literature to date: the possible differential effect of type of scores…
Descriptors: Test Bias, Statistics, Scoring, Comparative Analysis
Bryant, Damon; Davis, Larry – Applied Psychological Measurement, 2011
This brief technical note describes how to construct item vector plots for dichotomously scored items fitting the multidimensional three-parameter logistic model (M3PLM). As multidimensional item response theory (MIRT) shows promise of being a very useful framework in the test development life cycle, graphical tools that facilitate understanding…
Descriptors: Visual Aids, Item Response Theory, Evaluation Methods, Test Preparation
Finkelman, Matthew D.; Smits, Niels; Kim, Wonsuk; Riley, Barth – Applied Psychological Measurement, 2012
The Center for Epidemiologic Studies-Depression (CES-D) scale is a well-known self-report instrument that is used to measure depressive symptomatology. Respondents who take the full-length version of the CES-D are administered a total of 20 items. This article investigates the use of curtailment and stochastic curtailment (SC), two sequential…
Descriptors: Measures (Individuals), Depression (Psychology), Test Length, Computer Assisted Testing
Yang, Chongming; Nay, Sandra; Hoyle, Rick H. – Applied Psychological Measurement, 2010
Lengthy scales or testlets pose certain challenges for structural equation modeling (SEM) if all the items are included as indicators of a latent construct. Three general approaches to modeling lengthy scales in SEM (parceling, latent scoring, and shortening) have been reviewed and evaluated. A hypothetical population model is simulated containing…
Descriptors: Structural Equation Models, Measures (Individuals), Sample Size, Item Response Theory
Classification Consistency and Accuracy for Complex Assessments under the Compound Multinomial Model
Lee, Won-Chan; Brennan, Robert L.; Wan, Lei – Applied Psychological Measurement, 2009
For a test that consists of dichotomously scored items, several approaches have been reported in the literature for estimating classification consistency and accuracy indices based on a single administration of a test. Classification consistency and accuracy have not been studied much, however, for "complex" assessments--for example,…
Descriptors: Classification, Reliability, Test Items, Scoring
de la Torre, Jimmy – Applied Psychological Measurement, 2009
For one reason or another, various sources of information, namely, ancillary variables and correlational structure of the latent abilities, which are usually available in most testing situations, are ignored in ability estimation. A general model that incorporates these sources of information is proposed in this article. The model has a general…
Descriptors: Scoring, Multivariate Analysis, Ability, Computation
Hurtz, Gregory M.; Jones, J. Patrick; Jones, Christian N. – Applied Psychological Measurement, 2008
This study compares the efficacy of different strategies for translating item-level, proportion-correct standard-setting judgments into a theta-metric test cutoff score for use with item response theory (IRT) scoring, using Monte Carlo methods. Simulated Angoff-type ratings, consisting of 1,000 independent 75 Item x13 Rater matrices, were…
Descriptors: Monte Carlo Methods, Measures (Individuals), Item Response Theory, Standard Setting
Lee, Won-Chan – Applied Psychological Measurement, 2007
This article introduces a multinomial error model, which models an examinee's test scores obtained over repeated measurements of an assessment that consists of polytomously scored items. A compound multinomial error model is also introduced for situations in which items are stratified according to content categories and/or prespecified numbers of…
Descriptors: Simulation, Error of Measurement, Scoring, Test Items

Frary, Robert B. – Applied Psychological Measurement, 1980
Six scoring methods for assigning weights to right or wrong responses according to various instructions given to test takers are analyzed with respect to expected change scores and the effect of various levels of information and misinformation. Three of the methods provide feedback to the test taker. (Author/CTM)
Descriptors: Guessing (Tests), Knowledge Level, Multiple Choice Tests, Scores

McGarvey, Bill; And Others – Applied Psychological Measurement, 1977
The most consistently used scoring system for the rod-and-frame task has been the total number of degrees in error from the true vertical. Since a logical case can be made for at least four alternative scoring systems, a thorough comparison of all five systems was performed. (Author/CTM)
Descriptors: Analysis of Variance, Cognitive Style, Cognitive Tests, Elementary Education