ERIC - Search Results

Publication Date

In 2025	0
Since 2024	1
Since 2021 (last 5 years)	4
Since 2016 (last 10 years)	7
Since 2006 (last 20 years)	15

Descriptor

Methods	16
Test Items	8
Comparative Analysis	7
Computation	4
Computer Assisted Testing	4
Item Response Theory	4
Adaptive Testing	3
Correlation	3
Difficulty Level	3
Equated Scores	3
Scores	3
Validity	3
Accuracy	2
Cutting Scores	2
Item Analysis	2
Item Banks	2
Measurement	2
Models	2
Reading Tests	2
Sample Size	2
Scoring	2
Selection	2
Simulation	2
Standard Setting (Scoring)	2
Test Bias	2
More ▼

Source

Educational and Psychological…

Publication Type

Journal Articles	16
Reports - Research	14
Reports - Descriptive	1
Reports - Evaluative	1

Education Level

Elementary Education	2
Secondary Education	2
Elementary Secondary Education	1
Grade 3	1
Grade 4	1
Grade 7	1
High Schools	1
Intermediate Grades	1
Junior High Schools	1
Middle Schools	1
Primary Education	1
More ▼

Audience

Location

Germany	1
Michigan	1
Pennsylvania	1

Laws, Policies, & Programs

Assessments and Surveys

What Works Clearinghouse Rating

Showing 1 to 15 of 16 results Save | Export

Evaluating Equating Methods for Varying Levels of Form Difference

Peer reviewed

Direct link

Ting Sun; Stella Yun Kim – Educational and Psychological Measurement, 2024

Equating is a statistical procedure used to adjust for the difference in form difficulty such that scores on those forms can be used and interpreted comparably. In practice, however, equating methods are often implemented without considering the extent to which two forms differ in difficulty. The study aims to examine the effect of the magnitude…

Descriptors: Difficulty Level, Data Interpretation, Equated Scores, High School Students

Testing for Differential Item Functioning under the "D"-Scoring Method

Peer reviewed

Direct link

Dimitrov, Dimiter M.; Atanasov, Dimitar V. – Educational and Psychological Measurement, 2022

This study offers an approach to testing for differential item functioning (DIF) in a recently developed measurement framework, referred to as "D"-scoring method (DSM). Under the proposed approach, called "P-Z" method of testing for DIF, the item response functions of two groups (reference and focal) are compared by…

Descriptors: Test Bias, Methods, Test Items, Scoring

The NEAT Equating via Chaining Random Forests in the Context of Small Sample Sizes: A Machine-Learning Method

Peer reviewed

Direct link

Jiang, Zhehan; Han, Yuting; Xu, Lingling; Shi, Dexin; Liu, Ren; Ouyang, Jinying; Cai, Fen – Educational and Psychological Measurement, 2023

The part of responses that is absent in the nonequivalent groups with anchor test (NEAT) design can be managed to a planned missing scenario. In the context of small sample sizes, we present a machine learning (ML)-based imputation technique called chaining random forests (CRF) to perform equating tasks within the NEAT design. Specifically, seven…

Descriptors: Test Items, Equated Scores, Sample Size, Artificial Intelligence

The Impact and Detection of Uniform Differential Item Functioning for Continuous Item Response Models

Peer reviewed

Direct link

Finch, W. Holmes – Educational and Psychological Measurement, 2023

Psychometricians have devoted much research and attention to categorical item responses, leading to the development and widespread use of item response theory for the estimation of model parameters and identification of items that do not perform in the same way for examinees from different population subgroups (e.g., differential item functioning…

Descriptors: Test Bias, Item Response Theory, Computation, Methods

An External Validity Approach for Assessing Essential Unidimensionality in Correlated-Factor Models

Peer reviewed

Direct link

Ferrando, Pere Joan; Lorenzo-Seva, Urbano – Educational and Psychological Measurement, 2019

Many psychometric measures yield data that are compatible with (a) an essentially unidimensional factor analysis solution and (b) a correlated-factor solution. Deciding which of these structures is the most appropriate and useful is of considerable importance, and various procedures have been proposed to help in this decision. The only fully…

Descriptors: Validity, Models, Correlation, Factor Analysis

Item-Score Reliability in Empirical-Data Sets and Its Relationship with Other Item Indices

Peer reviewed

Direct link

Zijlmans, Eva A. O.; Tijmstra, Jesper; van der Ark, L. Andries; Sijtsma, Klaas – Educational and Psychological Measurement, 2018

Reliability is usually estimated for a total score, but it can also be estimated for item scores. Item-score reliability can be useful to assess the repeatability of an individual item score in a group. Three methods to estimate item-score reliability are discussed, known as method MS, method [lambda][subscript 6], and method CA. The item-score…

Descriptors: Test Items, Test Reliability, Correlation, Comparative Analysis

A Simulation Study on Methods of Correcting for the Effects of Extreme Response Style

Peer reviewed

Direct link

Wetzel, Eunike; Böhnke, Jan R.; Rose, Norman – Educational and Psychological Measurement, 2016

The impact of response styles such as extreme response style (ERS) on trait estimation has long been a matter of concern to researchers and practitioners. This simulation study investigated three methods that have been proposed for the correction of trait estimates for ERS effects: (a) mixed Rasch models, (b) multidimensional item response models,…

Descriptors: Response Style (Tests), Simulation, Methods, Computation

A Comparison of Four Item-Selection Methods for Severely Constrained CATs

Peer reviewed

Direct link

He, Wei; Diao, Qi; Hauser, Carl – Educational and Psychological Measurement, 2014

This study compared four item-selection procedures developed for use with severely constrained computerized adaptive tests (CATs). Severely constrained CATs refer to those adaptive tests that seek to meet a complex set of constraints that are often not conclusive to each other (i.e., an item may contribute to the satisfaction of several…

Descriptors: Comparative Analysis, Test Items, Selection, Computer Assisted Testing

a-Stratified Computerized Adaptive Testing in the Presence of Calibration Error

Peer reviewed

Direct link

Cheng, Ying; Patton, Jeffrey M.; Shao, Can – Educational and Psychological Measurement, 2015

a-Stratified computerized adaptive testing with b-blocking (AST), as an alternative to the widely used maximum Fisher information (MFI) item selection method, can effectively balance item pool usage while providing accurate latent trait estimates in computerized adaptive testing (CAT). However, previous comparisons of these methods have treated…

Descriptors: Computer Assisted Testing, Adaptive Testing, Test Items, Item Banks

Nominal Weights Mean Equating: A Method for Very Small Samples

Peer reviewed

Direct link

Babcock, Ben; Albano, Anthony; Raymond, Mark – Educational and Psychological Measurement, 2012

The authors introduced nominal weights mean equating, a simplified version of Tucker equating, as an alternative for dealing with very small samples. The authors then conducted three simulation studies to compare nominal weights mean equating to six other equating methods under the nonequivalent groups anchor test design with sample sizes of 20,…

Descriptors: Equated Scores, Methods, Sample Size, Simulation

Do Adjusted Subscores Lack Validity? Don't Blame the Messenger

Peer reviewed

Direct link

Sinharay, Sandip; Haberman, Shelby J.; Wainer, Howard – Educational and Psychological Measurement, 2011

There are several techniques that increase the precision of subscores by borrowing information from other parts of the test. These techniques have been criticized on validity grounds in several of the recent publications. In this note, the authors question the argument used in these publications and suggest both inherent limits to the validity…

Descriptors: Scores, Methods, Validity, Reliability

A Body of Work Standard-Setting Method with Construct Maps

Peer reviewed

Direct link

Wyse, Adam E.; Bunch, Michael B.; Deville, Craig; Viger, Steven G. – Educational and Psychological Measurement, 2014

This article describes a novel variation of the Body of Work method that uses construct maps to overcome problems of transparency, rater inconsistency, and scores gaps commonly occurring with the Body of Work method. The Body of Work method with construct maps was implemented to set cut-scores for two separate K-12 assessment programs in a large…

Descriptors: Standard Setting (Scoring), Educational Assessment, Elementary Secondary Education, Measurement

Balancing Flexible Constraints and Measurement Precision in Computerized Adaptive Testing

Peer reviewed

Direct link

Moyer, Eric L.; Galindo, Jennifer L.; Dodd, Barbara G. – Educational and Psychological Measurement, 2012

Managing test specifications--both multiple nonstatistical constraints and flexibly defined constraints--has become an important part of designing item selection procedures for computerized adaptive tests (CATs) in achievement testing. This study compared the effectiveness of three procedures: constrained CAT, flexible modified constrained CAT,…

Descriptors: Adaptive Testing, Computer Assisted Testing, Test Items, Item Analysis

An Application of Explanatory Item Response Modeling for Model-Based Proficiency Scaling

Peer reviewed

Direct link

Hartig, Johannes; Frey, Andreas; Nold, Gunter; Klieme, Eckhard – Educational and Psychological Measurement, 2012

The article compares three different methods to estimate effects of task characteristics and to use these estimates for model-based proficiency scaling: prediction of item difficulties from the Rasch model, the linear logistic test model (LLTM), and an LLTM including random item effects (LLTM+e). The methods are applied to empirical data from a…

Descriptors: Item Response Theory, Models, Methods, Computation

Reducing the Cognitive Complexity Associated with Standard Setting: A Comparison of the Single-Passage Bookmark and Yes/No Methods

Peer reviewed

Direct link

Skaggs, Gary; Hein, Serge F. – Educational and Psychological Measurement, 2011

Judgmental standard setting methods have been criticized for the cognitive complexity of the judgment task that panelists are asked to complete. This study compared two methods designed to reduce this complexity: the yes/no method and the single-passage bookmark method. Two mock standard setting panel meetings were convened, one for each method,…

Descriptors: Standard Setting (Scoring), Methods, Cutting Scores, Experienced Teachers

Previous Page | Next Page »

Pages: 1 | 2

Albano, Anthony	1
Atanasov, Dimitar V.	1
Babcock, Ben	1
Bunch, Michael B.	1
Böhnke, Jan R.	1
Cai, Fen	1
Cheng, Ying	1
Deville, Craig	1
Diao, Qi	1
Dimitrov, Dimiter M.	1
Dodd, Barbara G.	1
Ferrando, Pere Joan	1
Finch, W. Holmes	1
Frey, Andreas	1
Galindo, Jennifer L.	1
Haberman, Shelby J.	1
Han, Yuting	1
Hartig, Johannes	1
Hauser, Carl	1
He, Wei	1
Hein, Serge F.	1
Jiang, Zhehan	1
Klieme, Eckhard	1
Liu, Ren	1
More ▼