ERIC - Search Results

Publication Date

In 2025	0
Since 2024	0
Since 2021 (last 5 years)	0
Since 2016 (last 10 years)	1
Since 2006 (last 20 years)	5

Source

Applied Measurement in…	3
Applied Psychological…	2
Journal of Educational…	2
Psychometrika	2
American Institutes for…	1
ETS Research Report Series	1
Evaluation Review	1
NASPA - Student Affairs…	1

Publication Type

Reports - Evaluative	18
Journal Articles	11
Speeches/Meeting Papers	2
Numerical/Quantitative Data	1

Education Level

Elementary Secondary Education	2
Higher Education	2
Postsecondary Education	2
Grade 3	1
Grade 4	1
Grade 5	1
Grade 6	1
Grade 7	1
Grade 8	1
Kindergarten	1

Audience

Location

Arizona	1
Netherlands	1
Oklahoma	1
Texas (San Antonio)	1

Laws, Policies, & Programs

No Child Left Behind Act 2001

Assessments and Surveys

Test of English as a Foreign…	3
SAT (College Admission Test)	1

What Works Clearinghouse Rating

Showing 1 to 15 of 18 results Save | Export

Estimating Item Difficulty with Comparative Judgments. Research Report. ETS RR-14-39

Peer reviewed
PDF on ERIC

Download full text

Attali, Yigal; Saldivia, Luis; Jackson, Carol; Schuppan, Fred; Wanamaker, Wilbur – ETS Research Report Series, 2014

Previous investigations of the ability of content experts and test developers to estimate item difficulty have, for themost part, produced disappointing results. These investigations were based on a noncomparative method of independently rating the difficulty of items. In this article, we argue that, by eliciting comparative judgments of…

Descriptors: Test Items, Difficulty Level, Comparative Analysis, College Entrance Examinations

Cost-Saving Innovations: Lessons Learned from Three Campus Models

Download full text

Wesley, Alexa; Parnell, Amelia; Askin, Jacalyn – NASPA - Student Affairs Administrators in Higher Education, 2020

Higher education is deploying a range of innovative strategies in response to rising costs and financial challenges. This report presents findings from a NASPA [National Association of Student Personnel Administrators] study on cost-saving initiatives at three institutions. Each institution aimed to creatively address different areas affecting…

Descriptors: Higher Education, Cost Effectiveness, Professional Associations, Student Personnel Workers

Using the Graded Response Model to Control Spurious Interactions in Moderated Multiple Regression

Peer reviewed

Direct link

Morse, Brendan J.; Johanson, George A.; Griffeth, Rodger W. – Applied Psychological Measurement, 2012

Recent simulation research has demonstrated that using simple raw score to operationalize a latent construct can result in inflated Type I error rates for the interaction term of a moderated statistical model when the interaction (or lack thereof) is proposed at the latent variable level. Rescaling the scores using an appropriate item response…

Descriptors: Item Response Theory, Multiple Regression Analysis, Error of Measurement, Models

Comparisons of Methodologies and Results in Vertical Scaling for Educational Achievement Tests

Peer reviewed

Direct link

Tong, Ye; Kolen, Michael J. – Applied Measurement in Education, 2007

A number of vertical scaling methodologies were examined in this article. Scaling variations included data collection design, scaling method, item response theory (IRT) scoring procedure, and proficiency estimation method. Vertical scales were developed for Grade 3 through Grade 8 for 4 content areas and 9 simulated datasets. A total of 11 scaling…

Descriptors: Achievement Tests, Scaling, Methods, Item Response Theory

The Effect of Small Calibration Sample Sizes on TOEFL IRT-Based Equating.

Download full text

Tang, K. Linda; And Others – 1993

This study compared the performance of the LOGIST and BILOG computer programs on item response theory (IRT) based scaling and equating for the Test of English as a Foreign Language (TOEFL) using real and simulated data and two calibration structures. Applications of IRT for the TOEFL program are based on the three-parameter logistic (3PL) model.…

Descriptors: Comparative Analysis, Computer Simulation, Equated Scores, Estimation (Mathematics)

Comparison of Item Response Theory and Thurstone Methods of Vertical Scaling.

Peer reviewed

Burket, George R.; Yen, Wendy M. – Journal of Educational Measurement, 1997

Using simulated data modeled after real tests, a Thurstone method (L. Thurstone, 1925 and later) and three-parameter item response theory were compared for vertical scaling. Neither procedure produced artificial scale shrinkage, and both produced modest scale expansion for one simulated condition. (SLD)

Descriptors: Comparative Analysis, Item Response Theory, Scaling, Simulation

Effect of Scale Adjustment on the Comparison of Item and Ability Parameters.

Peer reviewed

Liou, Michelle – Applied Psychological Measurement, 1990

The effect of scale selection on error in calibrating item and ability parameters was investigated, with particular reference to the standardized mean-squared difference (SMSD) statistic. Through simulation, three scaling methods for selecting the common scale were used to demonstrate their effects on SMSD values. (SLD)

Descriptors: Comparative Analysis, Computer Simulation, Equations (Mathematics), Mathematical Models

The Effects of Heterogeneous Item Distributions on Reliability.

Peer reviewed

Enders, Craig K.; Bandalos, Deborah L. – Applied Measurement in Education, 1999

Examined the degree to which coefficient alpha is affected by including items with different distribution shapes within a unidimensional scale. Computer simulation results indicate that reliability does not increase dramatically as a result of using differentially shaped items within a scale. Discusses implications for test construction. (SLD)

Descriptors: Computer Simulation, Reliability, Scaling, Statistical Distributions

Vertical Scaling with the Rasch Model Utilizing Default and Tight Convergence Settings with WINSTEPS and BILOG-MG

Peer reviewed

Direct link

Custer, Michael; Omar, Md Hafidz; Pomplun, Mark – Applied Measurement in Education, 2006

This study compared vertical scaling results for the Rasch model from BILOG-MG and WINSTEPS. The item and ability parameters for the simulated vocabulary tests were scaled across 11 grades; kindergarten through 10th. Data were based on real data and were simulated under normal and skewed distribution assumptions. WINSTEPS and BILOG-MG were each…

Descriptors: Models, Scaling, Computer Software, Vocabulary

Robust Canonical Discriminant Analysis.

Peer reviewed

Verboon, Peter; van der Lans, Ivo A. – Psychometrika, 1994

A method for robust canonical discriminant analysis via two robust objective loss functions is discussed. Majorization is used at several stages in the minimization procedure to obtain a monotonically convergent algorithm. A simulation study and empirical data illustrate the procedure. (SLD)

Descriptors: Algorithms, Classification, Discriminant Analysis, Least Squares Statistics

Some Equations for Logically Scoring and Logically Indexing Bipolar Items and Variables.

Peer reviewed

Carifio, James – Evaluation Review, 1992

To solve an applied research problem with bipolar data, a set of equations was developed that combined all plus and minus data combinations into unique values and scale points. The equations were tested through computer simulations and empirical tests. Resulting index scores were approximately interval and linear and easy to use and interpret.…

Descriptors: Computer Simulation, Equations (Mathematics), Indexing, Mathematical Logic

Detecting Rater Effects with a Multi-Faceted Rating Scale Model.

Download full text

Wolfe, Edward W.; Chiu, Chris W. T. – 1997

How common patterns of rater errors may be detected in a large-scale performance assessment setting is discussed. Common rater effects are identified, and a scaling method that can be used to detect them in operational data sets is presented. Simulated data sets are generated to exhibit each of these rater effects. The three continua that depict…

Descriptors: Item Response Theory, Mathematical Models, Norms, Performance Based Assessment

Scaling Performance Assessments: A Comparison of One-Parameter and Two-Parameter Partial Credit Models.

Peer reviewed

Fitzpatrick, Anne R.; And Others – Journal of Educational Measurement, 1996

One-parameter (1PPC) and two-parameter partial credit (2PPC) models were compared using real and simulated data with constructed response items present. Results suggest that the more flexible three-parameter logistic-2PPC model combination produces better model fit than the combination of the one-parameter logistic and the 1PPC models. (SLD)

Descriptors: Comparative Analysis, Constructed Response, Goodness of Fit, Performance Based Assessment

Imputation of Missing Categorical Data by Maximizing Internal Consistency.

Peer reviewed

van Buuren, Stef; van Rijckevorsel, Jan L. A. – Psychometrika, 1992

A technique is presented to transform incomplete categorical data into complete data by imputing appropriate scores into missing cells. A solution of the optimization problem is suggested, and relevant psychometric theory is discussed. The average correlation should be at least 0.50 before the method becomes practical. (SLD)

Descriptors: Classification, Computer Simulation, Correlation, Equations (Mathematics)

An Investigation of the Use of Simplified IRT Models for Scaling and Equating the TOEFL Test. TOEFL Technical Report TR-2.

Download full text

Way, Walter D.; Reese, Clyde M. – 1991

The use of two alternative item response theory (IRT) estimation models in the scaling and equating of the Test of English as a Foreign Language (TOEFL) was explored; and item scaling and test equating results based on these models were compared with results based on the three-parameter (3PL) model currently being used with the TOEFL. Models were…

Descriptors: Correlation, Equated Scores, Estimation (Mathematics), Goodness of Fit

Previous Page | Next Page »

Pages: 1 | 2

Scaling	18
Simulation	13
Item Response Theory	8
Comparative Analysis	6
Computer Simulation	6
Equated Scores	4
Mathematical Models	4
Models	4
Scores	4
Test Items	4
Correlation	3
Equations (Mathematics)	3
Estimation (Mathematics)	3
Test Construction	3
Adaptive Testing	2
Algorithms	2
Classification	2
Computer Assisted Testing	2
Difficulty Level	2
Error of Measurement	2
Goodness of Fit	2
Performance Based Assessment	2
Pretests Posttests	2
Reliability	2
Sample Size	2
More ▼

Askin, Jacalyn	1
Attali, Yigal	1
Ban, Jae-Chun	1
Bandalos, Deborah L.	1
Burket, George R.	1
Carifio, James	1
Chiu, Chris W. T.	1
Custer, Michael	1
Enders, Craig K.	1
Fitzpatrick, Anne R.	1
Gallagher, Larry	1
Griffeth, Rodger W.	1
Hanson, Bradley A.	1
Harris, Deborah J.	1
Hicks, Marilyn M.	1
Jackson, Carol	1
Johanson, George A.	1
Kolen, Michael J.	1
Liou, Michelle	1
McLaughlin, Don	1
Morse, Brendan J.	1
Omar, Md Hafidz	1
Parnell, Amelia	1
Pomplun, Mark	1
More ▼