ERIC - Search Results

Publication Date

In 2025	0
Since 2024	0
Since 2021 (last 5 years)	0
Since 2016 (last 10 years)	0
Since 2006 (last 20 years)	14

Descriptor

Scoring	50
Item Response Theory	23
Test Items	13
Scores	9
Higher Education	8
Estimation (Mathematics)	7
Simulation	7
Computer Assisted Testing	6
Equations (Mathematics)	6
Evaluation Methods	6
Goodness of Fit	6
Measurement Techniques	6
Models	6
Multiple Choice Tests	6
Psychometrics	6
Test Reliability	6
Comparative Analysis	5
Correlation	5
Equated Scores	5
Monte Carlo Methods	5
Reliability	5
Responses	5
Scaling	5
Test Construction	5
Test Validity	5
More ▼

Source

Applied Psychological…

Publication Type

Journal Articles	46
Reports - Evaluative	21
Reports - Research	15
Reports - Descriptive	6
Book/Product Reviews	3
Speeches/Meeting Papers	2
Information Analyses	1

Education Level

Grade 8

Audience

Researchers

Location

China	1
Hong Kong	1
Macau	1
Netherlands	1
Taiwan (Taipei)	1

Laws, Policies, & Programs

Assessments and Surveys

ACT Assessment	1
Advanced Placement…	1
Center for Epidemiologic…	1
Defining Issues Test	1
National Assessment of…	1
Program for International…	1
Rod and Frame Test	1
United States Medical…	1

What Works Clearinghouse Rating

Showing 1 to 15 of 50 results Save | Export

Item Response Modeling of Presence-Severity Items: Application to Measurement of Patient-Reported Outcomes

Peer reviewed

Direct link

Liu, Ying; Verkuilen, Jay – Applied Psychological Measurement, 2013

The Presence-Severity (P-S) format refers to a compound item structure in which a question is first asked to check the presence of the particular event in question. If the respondent provides an affirmative answer, a follow-up is administered, often about the frequency, density, severity, or impact of the event. Despite the popularity of the P-S…

Descriptors: Item Response Theory, Measures (Individuals), Psychometrics, Cancer

A Polytomous Extension of the Generalized Distance Discriminating Method

Peer reviewed

Direct link

Sun, Jianan; Xin, Tao; Zhang, Shumei; de la Torre, Jimmy – Applied Psychological Measurement, 2013

This article proposes a generalized distance discriminating method for test with polytomous response (GDD-P). The new method is the polytomous extension of an item response theory (IRT)-based cognitive diagnostic method, which can identify examinees' ideal response patterns (IRPs) based on a generalized distance index. The similarities between…

Descriptors: Item Response Theory, Cognitive Tests, Diagnostic Tests, Matrices

A Latent Class Approach to Estimating Test-Score Reliability

Peer reviewed

Direct link

van der Ark, L. Andries; van der Palm, Daniel W.; Sijtsma, Klaas – Applied Psychological Measurement, 2011

This study presents a general framework for single-administration reliability methods, such as Cronbach's alpha, Guttman's lambda-2, and method MS. This general framework was used to derive a new approach to estimating test-score reliability by means of the unrestricted latent class model. This new approach is the latent class reliability…

Descriptors: Simulation, Reliability, Measurement, Psychology

Comparison of Automated Scoring Methods for a Computerized Performance Assessment of Clinical Judgment

Peer reviewed

Direct link

Harik, Polina; Baldwin, Peter; Clauser, Brian – Applied Psychological Measurement, 2013

Growing reliance on complex constructed response items has generated considerable interest in automated scoring solutions. Many of these solutions are described in the literature; however, relatively few studies have been published that "compare" automated scoring strategies. Here, comparisons are made among five strategies for…

Descriptors: Computer Assisted Testing, Automation, Scoring, Comparative Analysis

A Comparison of Four Methods of IRT Subscoring

Peer reviewed

Direct link

de la Torre, Jimmy; Song, Hao; Hong, Yuan – Applied Psychological Measurement, 2011

Lack of sufficient reliability is the primary impediment for generating and reporting subtest scores. Several current methods of subscore estimation do so either by incorporating the correlational structure among the subtest abilities or by using the examinee's performance on the overall test. This article conducted a systematic comparison of four…

Descriptors: Item Response Theory, Scoring, Methods, Comparative Analysis

A Comparison between Some Generalized Mantel-Haenszel Statistics for Detecting DIF in Data Simulated under the Graded Response Model

Peer reviewed

Direct link

Fidalgo, Angel M.; Bartram, Dave – Applied Psychological Measurement, 2010

The main objective of this study was to establish the relative efficacy of the generalized Mantel-Haenszel test (GMH) and the Mantel test for detecting large numbers of differential item functioning (DIF) patterns. To this end this study considered a topic not dealt with in the literature to date: the possible differential effect of type of scores…

Descriptors: Test Bias, Statistics, Scoring, Comparative Analysis

Item Vector Plots for the Multidimensional Three-Parameter Logistic Model

Peer reviewed

Direct link

Bryant, Damon; Davis, Larry – Applied Psychological Measurement, 2011

This brief technical note describes how to construct item vector plots for dichotomously scored items fitting the multidimensional three-parameter logistic model (M3PLM). As multidimensional item response theory (MIRT) shows promise of being a very useful framework in the test development life cycle, graphical tools that facilitate understanding…

Descriptors: Visual Aids, Item Response Theory, Evaluation Methods, Test Preparation

Curtailment and Stochastic Curtailment to Shorten the CES-D

Peer reviewed

Direct link

Finkelman, Matthew D.; Smits, Niels; Kim, Wonsuk; Riley, Barth – Applied Psychological Measurement, 2012

The Center for Epidemiologic Studies-Depression (CES-D) scale is a well-known self-report instrument that is used to measure depressive symptomatology. Respondents who take the full-length version of the CES-D are administered a total of 20 items. This article investigates the use of curtailment and stochastic curtailment (SC), two sequential…

Descriptors: Measures (Individuals), Depression (Psychology), Test Length, Computer Assisted Testing

Three Approaches to Using Lengthy Ordinal Scales in Structural Equation Models: Parceling, Latent Scoring, and Shortening Scales

Peer reviewed

Direct link

Yang, Chongming; Nay, Sandra; Hoyle, Rick H. – Applied Psychological Measurement, 2010

Lengthy scales or testlets pose certain challenges for structural equation modeling (SEM) if all the items are included as indicators of a latent construct. Three general approaches to modeling lengthy scales in SEM (parceling, latent scoring, and shortening) have been reviewed and evaluated. A hypothetical population model is simulated containing…

Descriptors: Structural Equation Models, Measures (Individuals), Sample Size, Item Response Theory

Classification Consistency and Accuracy for Complex Assessments under the Compound Multinomial Model

Peer reviewed

Direct link

Lee, Won-Chan; Brennan, Robert L.; Wan, Lei – Applied Psychological Measurement, 2009

For a test that consists of dichotomously scored items, several approaches have been reported in the literature for estimating classification consistency and accuracy indices based on a single administration of a test. Classification consistency and accuracy have not been studied much, however, for "complex" assessments--for example,…

Descriptors: Classification, Reliability, Test Items, Scoring

Improving the Quality of Ability Estimates through Multidimensional Scoring and Incorporation of Ancillary Variables

Peer reviewed

Direct link

de la Torre, Jimmy – Applied Psychological Measurement, 2009

For one reason or another, various sources of information, namely, ancillary variables and correlational structure of the latent abilities, which are usually available in most testing situations, are ignored in ability estimation. A general model that incorporates these sources of information is proposed in this article. The model has a general…

Descriptors: Scoring, Multivariate Analysis, Ability, Computation

Conversion of Proportion-Correct Standard-Setting Judgments to Cutoff Scores on the Item Response Theory Theta Scale

Peer reviewed

Direct link

Hurtz, Gregory M.; Jones, J. Patrick; Jones, Christian N. – Applied Psychological Measurement, 2008

This study compares the efficacy of different strategies for translating item-level, proportion-correct standard-setting judgments into a theta-metric test cutoff score for use with item response theory (IRT) scoring, using Monte Carlo methods. Simulated Angoff-type ratings, consisting of 1,000 independent 75 Item x13 Rater matrices, were…

Descriptors: Monte Carlo Methods, Measures (Individuals), Item Response Theory, Standard Setting

Multinomial and Compound Multinomial Error Models for Tests with Complex Item Scoring

Peer reviewed

Direct link

Lee, Won-Chan – Applied Psychological Measurement, 2007

This article introduces a multinomial error model, which models an examinee's test scores obtained over repeated measurements of an assessment that consists of polytomously scored items. A compound multinomial error model is also introduced for situations in which items are stratified according to content categories and/or prespecified numbers of…

Descriptors: Simulation, Error of Measurement, Scoring, Test Items

The Effect of Misinformation, Partial Information, and Guessing on Expected Multiple-Choice Test Item Scores.

Peer reviewed

Frary, Robert B. – Applied Psychological Measurement, 1980

Six scoring methods for assigning weights to right or wrong responses according to various instructions given to test takers are analyzed with respect to expected change scores and the effect of various levels of information and misinformation. Three of the methods provide feedback to the test taker. (Author/CTM)

Descriptors: Guessing (Tests), Knowledge Level, Multiple Choice Tests, Scores

Scoring Field Dependence: A Methodological Analysis of Five Rod-and-Frame Scoring Systems

Peer reviewed

McGarvey, Bill; And Others – Applied Psychological Measurement, 1977

The most consistently used scoring system for the rod-and-frame task has been the total number of degrees in error from the true vertical. Since a logical case can be made for at least four alternative scoring systems, a thorough comparison of all five systems was performed. (Author/CTM)

Descriptors: Analysis of Variance, Cognitive Style, Cognitive Tests, Elementary Education

Previous Page | Next Page »

Pages: 1 | 2 | 3 | 4

de la Torre, Jimmy	4
Bennett, Randy Elliot	2
Drasgow, Fritz	2
Hambleton, Ronald K.	2
Kolen, Michael J.	2
Lee, Won-Chan	2
Meijer, Rob R.	2
Reise, Steven P.	2
Andrich, David	1
Baker, Frank B.	1
Baldwin, Peter	1
Bartram, Dave	1
Bell, Richard C.	1
Blackman, Nicole J-M.	1
Brennan, Robert L.	1
Bryant, Damon	1
Budescu, David V.	1
Chernyshenko, Oleksandr S.	1
Clauser, Brian	1
Clauser, Brian E.	1
Cliff, Norman	1
Davis, Larry	1
Davison, Mark L.	1
Downey, Ronald G.	1
More ▼