ERIC - Search Results

Publication Date

In 2025	0
Since 2024	1
Since 2021 (last 5 years)	1
Since 2016 (last 10 years)	3
Since 2006 (last 20 years)	13

Descriptor

Evaluation Research	16
Evaluation Methods	11
Item Response Theory	8
Measurement	6
Simulation	6
Psychometrics	5
Educational Assessment	4
Educational Testing	4
Evaluation Problems	4
Testing Problems	4
Equated Scores	3
Sample Size	3
Scoring	3
Test Bias	3
Test Items	3
Cognitive Tests	2
Comparative Analysis	2
Comparative Testing	2
Data Analysis	2
Diagnostic Tests	2
Error of Measurement	2
Measurement Techniques	2
Measures (Individuals)	2
Models	2
Research Methodology	2
More ▼

Source

Journal of Educational…

Publication Type

Journal Articles	16
Reports - Evaluative	7
Reports - Research	7
Book/Product Reviews	1
Reports - Descriptive	1

Education Level

Elementary Secondary Education	1
Secondary Education	1

Audience

Location

Laws, Policies, & Programs

Assessments and Surveys

Advanced Placement…	1
SAT (College Admission Test)	1

What Works Clearinghouse Rating

Showing 1 to 15 of 16 results Save | Export

Sociocognitive Processes and Item Response Models: A Didactic Example

Peer reviewed

Direct link

Tao Gong; Lan Shuai; Robert J. Mislevy – Journal of Educational Measurement, 2024

The usual interpretation of the person and task variables in between-persons measurement models such as item response theory (IRT) is as attributes of persons and tasks, respectively. They can be viewed instead as ensemble descriptors of patterns of interactions among persons and situations that arise from sociocognitive complex adaptive system…

Descriptors: Cognitive Processes, Item Response Theory, Social Cognition, Individualized Instruction

Parameter Estimation in Rasch Models for Examinee-Selected Items

Peer reviewed

Direct link

Liu, Chen-Wei; Wang, Wen-Chung – Journal of Educational Measurement, 2017

The examinee-selected-item (ESI) design, in which examinees are required to respond to a fixed number of items in a given set of items (e.g., choose one item to respond from a pair of items), always yields incomplete data (i.e., only the selected items are answered and the others have missing data) that are likely nonignorable. Therefore, using…

Descriptors: Item Response Theory, Models, Maximum Likelihood Statistics, Data Analysis

Local Observed-Score Kernel Equating

Peer reviewed

Direct link

Wiberg, Marie; van der Linden, Wim J.; von Davier, Alina A. – Journal of Educational Measurement, 2014

Three local observed-score kernel equating methods that integrate methods from the local equating and kernel equating frameworks are proposed. The new methods were compared with their earlier counterparts with respect to such measures as bias--as defined by Lord's criterion of equity--and percent relative error. The local kernel item response…

Descriptors: Measurement Techniques, Evaluation Methods, Item Response Theory, Equated Scores

Maintaining Equivalent Cut Scores for Small Sample Test Forms

Peer reviewed

Direct link

Dwyer, Andrew C. – Journal of Educational Measurement, 2016

This study examines the effectiveness of three approaches for maintaining equivalent performance standards across test forms with small samples: (1) common-item equating, (2) resetting the standard, and (3) rescaling the standard. Rescaling the standard (i.e., applying common-item equating methodology to standard setting ratings to account for…

Descriptors: Cutting Scores, Equivalency Tests, Test Format, Academic Standards

Small-Sample Equating Using a Single-Group Nearly Equivalent Test (SiGNET) Design

Peer reviewed

Direct link

Puhan, Gautam; Moses, Timothy P.; Grant, Mary C.; McHale, Frederick – Journal of Educational Measurement, 2009

A single-group (SG) equating with nearly equivalent test forms (SiGNET) design was developed by Grant to equate small-volume tests. Under this design, the scored items for the operational form are divided into testlets or mini tests. An additional testlet is created but not scored for the first form. If the scored testlets are testlets 1-6 and the…

Descriptors: Equated Scores, Test Construction, Measurement, Measures (Individuals)

Impact of Diagnosticity on the Adequacy of Models for Cognitive Diagnosis under a Linear Attribute Structure: A Simulation Study

Peer reviewed

Direct link

de La Torre, Jimmy; Karelitz, Tzur M. – Journal of Educational Measurement, 2009

Compared to unidimensional item response models (IRMs), cognitive diagnostic models (CDMs) based on latent classes represent examinees' knowledge and item requirements using discrete structures. This study systematically examines the viability of retrofitting CDMs to IRM-based data with a linear attribute structure. The study utilizes a procedure…

Descriptors: Simulation, Item Response Theory, Psychometrics, Evaluation Methods

Model-Free CUSUM Methods for Person Fit

Peer reviewed

Direct link

Armstrong, Ronald D.; Shi, Min – Journal of Educational Measurement, 2009

This article demonstrates the use of a new class of model-free cumulative sum (CUSUM) statistics to detect person fit given the responses to a linear test. The fundamental statistic being accumulated is the likelihood ratio of two probabilities. The detection performance of this CUSUM scheme is compared to other model-free person-fit statistics…

Descriptors: Probability, Simulation, Models, Psychometrics

Monitoring Rater Performance over Time: A Framework for Detecting Differential Accuracy and Differential Scale Category Use

Peer reviewed

Direct link

Myford, Carol M.; Wolfe, Edward W. – Journal of Educational Measurement, 2009

In this study, we describe a framework for monitoring rater performance over time. We present several statistical indices to identify raters whose standards drift and explain how to use those indices operationally. To illustrate the use of the framework, we analyzed rating data from the 2002 Advanced Placement English Literature and Composition…

Descriptors: English Literature, Advanced Placement, Measures (Individuals), Writing (Composition)

Judges' Use of Examinee Performance Data in an Angoff Standard-Setting Exercise for a Medical Licensing Examination: An Experimental Study

Peer reviewed

Direct link

Clauser, Brian E.; Mee, Janet; Baldwin, Su G.; Margolis, Melissa J.; Dillon, Gerard F. – Journal of Educational Measurement, 2009

Although the Angoff procedure is among the most widely used standard setting procedures for tests comprising multiple-choice items, research has shown that subject matter experts have considerable difficulty accurately making the required judgments in the absence of examinee performance data. Some authors have viewed the need to provide…

Descriptors: Standard Setting (Scoring), Program Effectiveness, Expertise, Health Personnel

The Hierarchy Consistency Index: Evaluating Person Fit for Cognitive Diagnostic Assessment

Peer reviewed

Direct link

Cui, Ying; Leighton, Jacqueline P. – Journal of Educational Measurement, 2009

In this article, we introduce a person-fit statistic called the hierarchy consistency index (HCI) to help detect misfitting item response vectors for tests developed and analyzed based on a cognitive model. The HCI ranges from -1.0 to 1.0, with values close to -1.0 indicating that students respond unexpectedly or differently from the responses…

Descriptors: Test Length, Simulation, Correlation, Research Methodology

The Chain and Post-Stratification Methods for Observed-Score Equating: Their Relationship to Population Invariance

Peer reviewed

Direct link

von Davier, Alina A.; Holland, Paul W.; Thayer, Dorothy T. – Journal of Educational Measurement, 2004

The Non-Equivalent-groups Anchor Test (NEAT) design has been in wide use since at least the early 1940s. It involves two populations of test takers, P and Q, and makes use of an anchor test to link them. Two linking methods used for NEAT designs are those (a) based on chain equating and (b) that use the anchor test to post-stratify the…

Descriptors: Equated Scores, Evaluation Research, Comparative Testing, Population Groups

"The Program Evaluation Standards": How to Assess Evaluations of Educational Programs, 2nd Edition [Book Review].

Peer reviewed

Fournier, Deborah M. – Journal of Educational Measurement, 1994

The "Program Evaluation Standards" supplies a useful framework for generating questions to raise about any evaluation plan or evaluation report to assess its pros and cons. It is a valuable "how-to" for graduate students and professionals. This second edition incorporates changes in the field in the last decade. (SLD)

Descriptors: Evaluation Methods, Evaluation Research, Graduate Students, Guides

An Application of Score Equity Assessment: Invariance of Linkage of New SAT[R] to Old SAT across Gender Groups

Peer reviewed

Direct link

Liu, Jinghua; Cahn, Miriam F.; Dorans, Neil J. – Journal of Educational Measurement, 2006

The College Board's SAT[R] data are used to illustrate how the score equity assessment (SEA) can help inform the program about equatability. SEA is used to examine whether the content change(s) to the revised new SAT result in differential linking functions across gender groups. Results of population sensitivity analyses are reported on the…

Descriptors: Aptitude Tests, Comparative Analysis, Gender Differences, Scores

Modeling Randomness in Judging Rating Scales with a Random-Effects Rating Scale Model

Peer reviewed

Direct link

Wang, Wen-Chung; Wilson, Mark; Shih, Ching-Lin – Journal of Educational Measurement, 2006

This study presents the random-effects rating scale model (RE-RSM) which takes into account randomness in the thresholds over persons by treating them as random-effects and adding a random variable for each threshold in the rating scale model (RSM) (Andrich, 1978). The RE-RSM turns out to be a special case of the multidimensional random…

Descriptors: Item Analysis, Rating Scales, Item Response Theory, Monte Carlo Methods

Effect of Unequal Variances in Proficiency Distributions on Type-I Error of the Mantel-Haenszel Chi-Square Test for Differential Item Functioning

Peer reviewed

Direct link

Monahan, Patrick O.; Ankenmann, Robert D. – Journal of Educational Measurement, 2005

Empirical studies demonstrated Type-I error (TIE) inflation (especially for highly discriminating easy items) of the Mantel-Haenszel chi-square test for differential item functioning (DIF), when data conformed to item response theory (IRT) models more complex than Rasch, and when IRT proficiency distributions differed only in means. However, no…

Descriptors: Sample Size, Item Response Theory, Test Items, Test Bias

Previous Page | Next Page »

Pages: 1 | 2

Wang, Wen-Chung	2
von Davier, Alina A.	2
Ankenmann, Robert D.	1
Armstrong, Ronald D.	1
Baldwin, Su G.	1
Cahn, Miriam F.	1
Chen, Shu-Ying	1
Clauser, Brian E.	1
Cui, Ying	1
Dillon, Gerard F.	1
Dorans, Neil J.	1
Dwyer, Andrew C.	1
Fournier, Deborah M.	1
Grant, Mary C.	1
Holland, Paul W.	1
Karelitz, Tzur M.	1
Lan Shuai	1
Lei, Pui-Wa	1
Leighton, Jacqueline P.	1
Liu, Chen-Wei	1
Liu, Jinghua	1
Margolis, Melissa J.	1
McHale, Frederick	1
Mee, Janet	1
Monahan, Patrick O.	1
More ▼