Publication Date
In 2025 | 0 |
Since 2024 | 0 |
Since 2021 (last 5 years) | 0 |
Since 2016 (last 10 years) | 4 |
Since 2006 (last 20 years) | 13 |
Descriptor
Psychometrics | 22 |
Test Items | 22 |
Item Response Theory | 9 |
Test Construction | 6 |
Computer Assisted Testing | 5 |
Difficulty Level | 5 |
Models | 5 |
Reading Tests | 5 |
Scores | 4 |
Simulation | 4 |
Cognitive Processes | 3 |
More ▼ |
Source
Applied Measurement in… | 22 |
Author
Gierl, Mark J. | 2 |
Puhan, Gautam | 2 |
Alves, Cecilia B. | 1 |
Antal, Judit | 1 |
Bahry, Louise M. | 1 |
Boulais, André-Philippe | 1 |
Chang, Lei | 1 |
Chu, Man-Wai | 1 |
Cor, M. Ken | 1 |
Cui, Ying | 1 |
Custer, Michael | 1 |
More ▼ |
Publication Type
Journal Articles | 22 |
Reports - Research | 11 |
Reports - Evaluative | 10 |
Reports - Descriptive | 1 |
Education Level
Elementary Education | 2 |
Elementary Secondary Education | 1 |
Grade 10 | 1 |
Grade 3 | 1 |
Grade 5 | 1 |
Grade 7 | 1 |
High Schools | 1 |
Higher Education | 1 |
Postsecondary Education | 1 |
Primary Education | 1 |
Audience
Laws, Policies, & Programs
Assessments and Surveys
Law School Admission Test | 1 |
National Assessment of… | 1 |
Preliminary Scholastic… | 1 |
SAT (College Admission Test) | 1 |
What Works Clearinghouse Rating
O'Neill, Thomas R.; Gregg, Justin L.; Peabody, Michael R. – Applied Measurement in Education, 2020
This study addresses equating issues with varying sample sizes using the Rasch model by examining how sample size affects the stability of item calibrations and person ability estimates. A resampling design was used to create 9 sample size conditions (200, 100, 50, 45, 40, 35, 30, 25, and 20), each replicated 10 times. Items were recalibrated…
Descriptors: Sample Size, Equated Scores, Item Response Theory, Raw Scores
Furter, Robert T.; Dwyer, Andrew C. – Applied Measurement in Education, 2020
Maintaining equivalent performance standards across forms is a psychometric challenge exacerbated by small samples. In this study, the accuracy of two equating methods (Rasch anchored calibration and nominal weights mean) and four anchor item selection methods were investigated in the context of very small samples (N = 10). Overall, nominal…
Descriptors: Classification, Accuracy, Item Response Theory, Equated Scores
Kopp, Jason P.; Jones, Andrew T. – Applied Measurement in Education, 2020
Traditional psychometric guidelines suggest that at least several hundred respondents are needed to obtain accurate parameter estimates under the Rasch model. However, recent research indicates that Rasch equating results in accurate parameter estimates with sample sizes as small as 25. Item parameter drift under the Rasch model has been…
Descriptors: Item Response Theory, Psychometrics, Sample Size, Sampling
Antal, Judit; Proctor, Thomas P.; Melican, Gerald J. – Applied Measurement in Education, 2014
In common-item equating the anchor block is generally built to represent a miniature form of the total test in terms of content and statistical specifications. The statistical properties frequently reflect equal mean and spread of item difficulty. Sinharay and Holland (2007) suggested that the requirement for equal spread of difficulty may be too…
Descriptors: Test Items, Equated Scores, Difficulty Level, Item Response Theory
Gierl, Mark J.; Lai, Hollis; Pugh, Debra; Touchie, Claire; Boulais, André-Philippe; De Champlain, André – Applied Measurement in Education, 2016
Item development is a time- and resource-intensive process. Automatic item generation integrates cognitive modeling with computer technology to systematically generate test items. To date, however, items generated using cognitive modeling procedures have received limited use in operational testing situations. As a result, the psychometric…
Descriptors: Psychometrics, Multiple Choice Tests, Test Items, Item Analysis
Roduta Roberts, Mary; Alves, Cecilia B.; Chu, Man-Wai; Thompson, Margaret; Bahry, Louise M.; Gotzmann, Andrea – Applied Measurement in Education, 2014
The purpose of this study was to evaluate the adequacy of three cognitive models, one developed by content experts and two generated from student verbal reports for explaining examinee performance on a grade 3 diagnostic mathematics test. For this study, the items were developed to directly measure the attributes in the cognitive model. The…
Descriptors: Foreign Countries, Mathematics Tests, Cognitive Processes, Models
Edwards, Michael C.; Flora, David B.; Thissen, David – Applied Measurement in Education, 2012
This article describes a computerized adaptive test (CAT) based on the uniform item exposure multi-form structure (uMFS). The uMFS is a specialization of the multi-form structure (MFS) idea described by Armstrong, Jones, Berliner, and Pashley (1998). In an MFS CAT, the examinee first responds to a small fixed block of items. The items comprising…
Descriptors: Adaptive Testing, Computer Assisted Testing, Test Format, Test Items
Kahraman, Nilufer; De Champlain, Andre; Raymond, Mark – Applied Measurement in Education, 2012
Item-level information, such as difficulty and discrimination are invaluable to the test assembly, equating, and scoring practices. Estimating these parameters within the context of large-scale performance assessments is often hindered by the use of unbalanced designs for assigning examinees to tasks and raters because such designs result in very…
Descriptors: Performance Based Assessment, Medicine, Factor Analysis, Test Items
Randall, Jennifer; Engelhard, George, Jr. – Applied Measurement in Education, 2010
The psychometric properties and multigroup measurement invariance of scores across subgroups, items, and persons on the "Reading for Meaning" items from the Georgia Criterion Referenced Competency Test (CRCT) were assessed in a sample of 778 seventh-grade students. Specifically, we sought to determine the extent to which score-based…
Descriptors: Testing Accommodations, Test Items, Learning Disabilities, Factor Analysis
Meyers, Jason L.; Miller, G. Edward; Way, Walter D. – Applied Measurement in Education, 2009
In operational testing programs using item response theory (IRT), item parameter invariance is threatened when an item appears in a different location on the live test than it did when it was field tested. This study utilizes data from a large state's assessments to model change in Rasch item difficulty (RID) as a function of item position change,…
Descriptors: Test Items, Test Content, Testing Programs, Simulation
Puhan, Gautam – Applied Measurement in Education, 2009
The purpose of this study is to determine the extent of scale drift on a test that employs cut scores. It was essential to examine scale drift for this testing program because new forms in this testing program are often put on scale through a series of intermediate equatings (known as equating chains). This process may cause equating error to…
Descriptors: Testing Programs, Testing, Measurement Techniques, Item Response Theory
Hein, Serge F.; Skaggs, Gary E. – Applied Measurement in Education, 2009
Only a small number of qualitative studies have investigated panelists' experiences during standard-setting activities or the thought processes associated with panelists' actions. This qualitative study involved an examination of the experiences of 11 panelists who participated in a prior, one-day standard-setting meeting in which either the…
Descriptors: Focus Groups, Standard Setting, Cutting Scores, Cognitive Processes
Leighton, Jacqueline P.; Cui, Ying; Cor, M. Ken – Applied Measurement in Education, 2009
The objective of the present investigation was to compare the adequacy of two cognitive models for predicting examinee performance on a sample of algebra I and II items from the March 2005 administration of the SAT[TM]. The two models included one generated from verbal reports provided by 21 examinees as they solved the SAT[TM] items, and the…
Descriptors: Test Items, Inferences, Cognitive Ability, Prediction
DeMars, Christine E. – Applied Measurement in Education, 2004
Three methods of detecting item drift were compared: the procedure in BILOG-MG for estimating linear trends in item difficulty, the CUSUM procedure that Veerkamp and Glas (2000) used to detect trends in difficulty or discrimination, and a modification of Kim, Cohen, and Park's (1995) x 2 test for multiple-group differential item functioning (DIF),…
Descriptors: Comparative Analysis, Test Items, Testing, Item Analysis

Chang, Lei – Applied Measurement in Education, 1995
A test item is defined as connotatively consistent (CC) or connotatively inconsistent (CI) when its connotation agrees with or contradicts that of the majority of items on a test. CC and CI items were examined in the Life Orientation Test and were shown to measure correlated but distinct traits. (SLD)
Descriptors: Attitude Measures, College Students, Higher Education, Personality Measures
Previous Page | Next Page »
Pages: 1 | 2