ERIC - Search Results

Publication Date

In 2025	0
Since 2024	0
Since 2021 (last 5 years)	0
Since 2016 (last 10 years)	4
Since 2006 (last 20 years)	13

Descriptor

Psychometrics	22
Test Items	22
Item Response Theory	9
Test Construction	6
Computer Assisted Testing	5
Difficulty Level	5
Models	5
Reading Tests	5
Scores	4
Simulation	4
Cognitive Processes	3
Cutting Scores	3
Equated Scores	3
Error of Measurement	3
Foreign Countries	3
Item Analysis	3
Mathematics Tests	3
Multiple Choice Tests	3
Responses	3
Sample Size	3
Testing	3
Accuracy	2
Achievement Tests	2
Adaptive Testing	2
Attitude Measures	2
More ▼

Source

Applied Measurement in…

Publication Type

Journal Articles	22
Reports - Research	11
Reports - Evaluative	10
Reports - Descriptive	1

Education Level

Elementary Education	2
Elementary Secondary Education	1
Grade 10	1
Grade 3	1
Grade 5	1
Grade 7	1
High Schools	1
Higher Education	1
Postsecondary Education	1
Primary Education	1

Audience

Location

Canada	2
Georgia	1
New York	1

Laws, Policies, & Programs

Assessments and Surveys

Law School Admission Test	1
National Assessment of…	1
Preliminary Scholastic…	1
SAT (College Admission Test)	1

What Works Clearinghouse Rating

Showing 1 to 15 of 22 results Save | Export

Effect of Sample Size on Common Item Equating Using the Dichotomous Rasch Model

Peer reviewed

Direct link

O'Neill, Thomas R.; Gregg, Justin L.; Peabody, Michael R. – Applied Measurement in Education, 2020

This study addresses equating issues with varying sample sizes using the Rasch model by examining how sample size affects the stability of item calibrations and person ability estimates. A resampling design was used to create 9 sample size conditions (200, 100, 50, 45, 40, 35, 30, 25, and 20), each replicated 10 times. Items were recalibrated…

Descriptors: Sample Size, Equated Scores, Item Response Theory, Raw Scores

Investigating the Classification Accuracy of Rasch and Nominal Weights Mean Equating with Very Small Samples

Peer reviewed

Direct link

Furter, Robert T.; Dwyer, Andrew C. – Applied Measurement in Education, 2020

Maintaining equivalent performance standards across forms is a psychometric challenge exacerbated by small samples. In this study, the accuracy of two equating methods (Rasch anchored calibration and nominal weights mean) and four anchor item selection methods were investigated in the context of very small samples (N = 10). Overall, nominal…

Descriptors: Classification, Accuracy, Item Response Theory, Equated Scores

Impact of Item Parameter Drift on Rasch Scale Stability in Small Samples over Multiple Administrations

Peer reviewed

Direct link

Kopp, Jason P.; Jones, Andrew T. – Applied Measurement in Education, 2020

Traditional psychometric guidelines suggest that at least several hundred respondents are needed to obtain accurate parameter estimates under the Rasch model. However, recent research indicates that Rasch equating results in accurate parameter estimates with sample sizes as small as 25. Item parameter drift under the Rasch model has been…

Descriptors: Item Response Theory, Psychometrics, Sample Size, Sampling

The Effect of Anchor Test Construction on Scale Drift

Peer reviewed

Direct link

Antal, Judit; Proctor, Thomas P.; Melican, Gerald J. – Applied Measurement in Education, 2014

In common-item equating the anchor block is generally built to represent a miniature form of the total test in terms of content and statistical specifications. The statistical properties frequently reflect equal mean and spread of item difficulty. Sinharay and Holland (2007) suggested that the requirement for equal spread of difficulty may be too…

Descriptors: Test Items, Equated Scores, Difficulty Level, Item Response Theory

Evaluating the Psychometric Characteristics of Generated Multiple-Choice Test Items

Peer reviewed

Direct link

Gierl, Mark J.; Lai, Hollis; Pugh, Debra; Touchie, Claire; Boulais, André-Philippe; De Champlain, André – Applied Measurement in Education, 2016

Item development is a time- and resource-intensive process. Automatic item generation integrates cognitive modeling with computer technology to systematically generate test items. To date, however, items generated using cognitive modeling procedures have received limited use in operational testing situations. As a result, the psychometric…

Descriptors: Psychometrics, Multiple Choice Tests, Test Items, Item Analysis

Testing Expert-Based versus Student-Based Cognitive Models for a Grade 3 Diagnostic Mathematics Assessment

Peer reviewed

Direct link

Roduta Roberts, Mary; Alves, Cecilia B.; Chu, Man-Wai; Thompson, Margaret; Bahry, Louise M.; Gotzmann, Andrea – Applied Measurement in Education, 2014

The purpose of this study was to evaluate the adequacy of three cognitive models, one developed by content experts and two generated from student verbal reports for explaining examinee performance on a grade 3 diagnostic mathematics test. For this study, the items were developed to directly measure the attributes in the cognitive model. The…

Descriptors: Foreign Countries, Mathematics Tests, Cognitive Processes, Models

Multistage Computerized Adaptive Testing with Uniform Item Exposure

Peer reviewed

Direct link

Edwards, Michael C.; Flora, David B.; Thissen, David – Applied Measurement in Education, 2012

This article describes a computerized adaptive test (CAT) based on the uniform item exposure multi-form structure (uMFS). The uMFS is a specialization of the multi-form structure (MFS) idea described by Armstrong, Jones, Berliner, and Pashley (1998). In an MFS CAT, the examinee first responds to a small fixed block of items. The items comprising…

Descriptors: Adaptive Testing, Computer Assisted Testing, Test Format, Test Items

Modeling the Psychometric Properties of Complex Performance Assessment Tasks Using Confirmatory Factor Analysis: A Multistage Model for Calibrating Tasks

Peer reviewed

Direct link

Kahraman, Nilufer; De Champlain, Andre; Raymond, Mark – Applied Measurement in Education, 2012

Item-level information, such as difficulty and discrimination are invaluable to the test assembly, equating, and scoring practices. Estimating these parameters within the context of large-scale performance assessments is often hindered by the use of unbalanced designs for assigning examinees to tasks and raters because such designs result in very…

Descriptors: Performance Based Assessment, Medicine, Factor Analysis, Test Items

Using Confirmatory Factor Analysis and the Rasch Model to Assess Measurement Invariance in a High Stakes Reading Assessment

Peer reviewed

Direct link

Randall, Jennifer; Engelhard, George, Jr. – Applied Measurement in Education, 2010

The psychometric properties and multigroup measurement invariance of scores across subgroups, items, and persons on the "Reading for Meaning" items from the Georgia Criterion Referenced Competency Test (CRCT) were assessed in a sample of 778 seventh-grade students. Specifically, we sought to determine the extent to which score-based…

Descriptors: Testing Accommodations, Test Items, Learning Disabilities, Factor Analysis

Item Position and Item Difficulty Change in an IRT-Based Common Item Equating Design

Peer reviewed

Direct link

Meyers, Jason L.; Miller, G. Edward; Way, Walter D. – Applied Measurement in Education, 2009

In operational testing programs using item response theory (IRT), item parameter invariance is threatened when an item appears in a different location on the live test than it did when it was field tested. This study utilizes data from a large state's assessments to model change in Rasch item difficulty (RID) as a function of item position change,…

Descriptors: Test Items, Test Content, Testing Programs, Simulation

Detecting and Correcting Scale Drift in Test Equating: An Illustration from a Large Scale Testing Program

Peer reviewed

Direct link

Puhan, Gautam – Applied Measurement in Education, 2009

The purpose of this study is to determine the extent of scale drift on a test that employs cut scores. It was essential to examine scale drift for this testing program because new forms in this testing program are often put on scale through a series of intermediate equatings (known as equating chains). This process may cause equating error to…

Descriptors: Testing Programs, Testing, Measurement Techniques, Item Response Theory

A Qualitative Investigation of Panelists' Experiences of Standard Setting Using Two Variations of the Bookmark Method

Peer reviewed

Direct link

Hein, Serge F.; Skaggs, Gary E. – Applied Measurement in Education, 2009

Only a small number of qualitative studies have investigated panelists' experiences during standard-setting activities or the thought processes associated with panelists' actions. This qualitative study involved an examination of the experiences of 11 panelists who participated in a prior, one-day standard-setting meeting in which either the…

Descriptors: Focus Groups, Standard Setting, Cutting Scores, Cognitive Processes

Testing Expert-Based and Student-Based Cognitive Models: An Application of the Attribute Hierarchy Method and Hierarchy Consistency Index

Peer reviewed

Direct link

Leighton, Jacqueline P.; Cui, Ying; Cor, M. Ken – Applied Measurement in Education, 2009

The objective of the present investigation was to compare the adequacy of two cognitive models for predicting examinee performance on a sample of algebra I and II items from the March 2005 administration of the SAT[TM]. The two models included one generated from verbal reports provided by 21 examinees as they solved the SAT[TM] items, and the…

Descriptors: Test Items, Inferences, Cognitive Ability, Prediction

Detection of Item Parameter Drift over Multiple Test Administrations

Peer reviewed

Direct link

DeMars, Christine E. – Applied Measurement in Education, 2004

Three methods of detecting item drift were compared: the procedure in BILOG-MG for estimating linear trends in item difficulty, the CUSUM procedure that Veerkamp and Glas (2000) used to detect trends in difficulty or discrimination, and a modification of Kim, Cohen, and Park's (1995) x 2 test for multiple-group differential item functioning (DIF),…

Descriptors: Comparative Analysis, Test Items, Testing, Item Analysis

Connotatively Inconsistent Test Items.

Peer reviewed

Chang, Lei – Applied Measurement in Education, 1995

A test item is defined as connotatively consistent (CC) or connotatively inconsistent (CI) when its connotation agrees with or contradicts that of the majority of items on a test. CC and CI items were examined in the Life Orientation Test and were shown to measure correlated but distinct traits. (SLD)

Descriptors: Attitude Measures, College Students, Higher Education, Personality Measures

Previous Page | Next Page »

Pages: 1 | 2

Gierl, Mark J.	2
Puhan, Gautam	2
Alves, Cecilia B.	1
Antal, Judit	1
Bahry, Louise M.	1
Boulais, André-Philippe	1
Chang, Lei	1
Chu, Man-Wai	1
Cor, M. Ken	1
Cui, Ying	1
Custer, Michael	1
De Champlain, Andre	1
De Champlain, André	1
DeMars, Christine E.	1
Drasgow, Fritz	1
Dwyer, Andrew C.	1
Edwards, Michael C.	1
Engelhard, George, Jr.	1
Ercikan, Kadriye	1
Flora, David B.	1
Furter, Robert T.	1
Gotzmann, Andrea	1
Gregg, Justin L.	1
Hein, Serge F.	1
More ▼