ERIC - Search Results

Publication Date

In 2025	0
Since 2024	0
Since 2021 (last 5 years)	2
Since 2016 (last 10 years)	4
Since 2006 (last 20 years)	10

Descriptor

Models	15
Scaling	15
Test Items	15
Item Response Theory	11
Difficulty Level	4
Simulation	4
Correlation	3
Foreign Countries	3
Goodness of Fit	3
Mathematics	3
Scores	3
Test Construction	3
Comparative Analysis	2
Computation	2
Equated Scores	2
Factor Analysis	2
International Assessment	2
Language Tests	2
Matrices	2
Methods	2
Regression (Statistics)	2
Sample Size	2
Sampling	2
Scoring	2
Statistical Inference	2
More ▼

Source

Applied Psychological…	3
Educational and Psychological…	3
ProQuest LLC	2
European Educational Research…	1
Journal of Early Intervention	1
Large-scale Assessments in…	1
Measurement and Evaluation in…	1
Measurement:…	1
National Center for Research…	1

Publication Type

Journal Articles	11
Reports - Research	6
Reports - Evaluative	4
Reports - Descriptive	3
Dissertations/Theses -…	2
Information Analyses	1
Speeches/Meeting Papers	1

Education Level

Secondary Education

Audience

Researchers

Location

Germany	2
Europe	1

Laws, Policies, & Programs

Assessments and Surveys

Program for International…

What Works Clearinghouse Rating

Showing all 15 results Save | Export

Statistical Estimation and Inference for Large-Scale Categorical Data

Direct link

Chengcheng Li – ProQuest LLC, 2022

Categorical data become increasingly ubiquitous in the modern big data era. In this dissertation, we propose novel statistical learning and inference methods for large-scale categorical data, focusing on latent variable models and their applications to psychometrics. In psychometric assessments, the subjects' underlying aptitude often cannot be…

Descriptors: Statistical Inference, Data Analysis, Psychometrics, Raw Scores

Comparing Different Trend Estimation Approaches in Country Means and Standard Deviations in International Large-Scale Assessment Studies

Peer reviewed

Direct link

Robitzsch, Alexander; Lüdtke, Oliver – Large-scale Assessments in Education, 2023

One major aim of international large-scale assessments (ILSA) like PISA is to monitor changes in student performance over time. To accomplish this task, a set of common items (i.e., link items) is repeatedly administered in each assessment. Linking methods based on item response theory (IRT) models are used to align the results from the different…

Descriptors: Educational Trends, Trend Analysis, International Assessment, Achievement Tests

Polytomous Rasch Models in Counseling Assessment

Peer reviewed

Direct link

Willse, John T. – Measurement and Evaluation in Counseling and Development, 2017

This article provides a brief introduction to the Rasch model. Motivation for using Rasch analyses is provided. Important Rasch model concepts and key aspects of result interpretation are introduced, with major points reinforced using a simulation demonstration. Concrete guidelines are provided regarding sample size and the evaluation of items.

Descriptors: Item Response Theory, Test Results, Test Interpretation, Simulation

Maximum Likelihood Item Easiness Models for Test Theory without an Answer Key

Peer reviewed

Direct link

France, Stephen L.; Batchelder, William H. – Educational and Psychological Measurement, 2015

Cultural consensus theory (CCT) is a data aggregation technique with many applications in the social and behavioral sciences. We describe the intuition and theory behind a set of CCT models for continuous type data using maximum likelihood inference methodology. We describe how bias parameters can be incorporated into these models. We introduce…

Descriptors: Maximum Likelihood Statistics, Test Items, Difficulty Level, Test Theory

Assessment of Computer and Information Literacy in ICILS 2013: Do Different Item Types Measure the Same Construct?

Peer reviewed

Direct link

Ihme, Jan Marten; Senkbeil, Martin; Goldhammer, Frank; Gerick, Julia – European Educational Research Journal, 2017

The combination of different item formats is found quite often in large scale assessments, and analyses on the dimensionality often indicate multi-dimensionality of tests regarding the task format. In ICILS 2013, three different item types (information-based response tasks, simulation tasks, and authoring tasks) were used to measure computer and…

Descriptors: Foreign Countries, Computer Literacy, Information Literacy, International Assessment

Lord-Wingersky Algorithm Version 2.0 for Hierarchical Item Factor Models with Applications in Test Scoring, Scale Alignment, and Model Fit Testing. CRESST Report 830

Download full text

Cai, Li – National Center for Research on Evaluation, Standards, and Student Testing (CRESST), 2013

Lord and Wingersky's (1984) recursive algorithm for creating summed score based likelihoods and posteriors has a proven track record in unidimensional item response theory (IRT) applications. Extending the recursive algorithm to handle multidimensionality is relatively simple, especially with fixed quadrature because the recursions can be defined…

Descriptors: Mathematics, Scores, Item Response Theory, Computation

Minimum Sample Size Requirements for Mokken Scale Analysis

Peer reviewed

Direct link

Straat, J. Hendrik; van der Ark, L. Andries; Sijtsma, Klaas – Educational and Psychological Measurement, 2014

An automated item selection procedure in Mokken scale analysis partitions a set of items into one or more Mokken scales, if the data allow. Two algorithms are available that pursue the same goal of selecting Mokken scales of maximum length: Mokken's original automated item selection procedure (AISP) and a genetic algorithm (GA). Minimum…

Descriptors: Sampling, Test Items, Effect Size, Scaling

An Application of Explanatory Item Response Modeling for Model-Based Proficiency Scaling

Peer reviewed

Direct link

Hartig, Johannes; Frey, Andreas; Nold, Gunter; Klieme, Eckhard – Educational and Psychological Measurement, 2012

The article compares three different methods to estimate effects of task characteristics and to use these estimates for model-based proficiency scaling: prediction of item difficulties from the Rasch model, the linear logistic test model (LLTM), and an LLTM including random item effects (LLTM+e). The methods are applied to empirical data from a…

Descriptors: Item Response Theory, Models, Methods, Computation

Exploring Unidimensional Proficiency Classification Accuracy from Multidimensional Data in a Vertical Scaling Context

Direct link

Kroopnick, Marc Howard – ProQuest LLC, 2010

When Item Response Theory (IRT) is operationally applied for large scale assessments, unidimensionality is typically assumed. This assumption requires that the test measures a single latent trait. Furthermore, when tests are vertically scaled using IRT, the assumption of unidimensionality would require that the battery of tests across grades…

Descriptors: Simulation, Scaling, Standard Setting, Item Response Theory

Linear Equating for the NEAT Design: Parameter Substitution Models and Chained Linear Relationship Models

Peer reviewed

Direct link

Kane, Michael T.; Mroch, Andrew A.; Suh, Youngsuk; Ripkey, Douglas R. – Measurement: Interdisciplinary Research and Perspectives, 2009

This paper analyzes five linear equating models for the "nonequivalent groups with anchor test" (NEAT) design with internal anchors (i.e., the anchor test is part of the full test). The analysis employs a two-dimensional framework. The first dimension contrasts two general approaches to developing the equating relationship. Under a "parameter…

Descriptors: Scaling, Equated Scores, Methods, Test Items

Linking Multidimensional Item Calibrations.

Peer reviewed

Davey, Tim; And Others – Applied Psychological Measurement, 1996

Scales defined by most item response theory (IRT) models are truly invariant with respect to certain linear transformations of parameters. The problem is to find the proper transformation to place calibrations on a common scale. This paper explores issues of extending and adapting unidimensional linking procedures to multidimensional IRT models.…

Descriptors: Equated Scores, Item Response Theory, Models, Scaling

A Supplement to "The Number of Guttman Errors as a Simple and Powerful Person-Fit Statistic."

Peer reviewed

Meijer, Rob R. – Applied Psychological Measurement, 1995

A statistic used by R. Meijer (1994) to determine person-fit referred to the number of errors from the deterministic Guttman model (L. Guttman, 1950), but this was, in fact, based on the number of errors from the deterministic Guttman model as defined by J. Loevinger (1947, 1948). (SLD)

Descriptors: Difficulty Level, Models, Responses, Scaling

Estimates of the Sampling Distribution of Scalability Coefficient H

Peer reviewed

Direct link

Van Onna, Marieke J. H. – Applied Psychological Measurement, 2004

Coefficient "H" is used as an index of scalability in nonparametric item response theory (NIRT). It indicates the degree to which a set of items rank orders examinees. Theoretical sampling distributions, however, have only been derived asymptotically and only under restrictive conditions. Bootstrap methods offer an alternative possibility to…

Descriptors: Sampling, Item Response Theory, Scaling, Comparative Analysis

The Rasch Measurement Model: An Introduction.

Peer reviewed

Snyder, Scott; Sheehan, Robert – Journal of Early Intervention, 1992

This examination of the Rasch scaling model concludes that the model could potentially facilitate objective comparisons of status and change of young children with disabilities at individual and group levels. The paper discusses applications of the model to early childhood assessment in the areas of item banking, test analysis, and subject…

Descriptors: Disabilities, Evaluation Methods, Item Response Theory, Measurement Techniques

A Factor Analytic Item Response Theory Approach for Relating Item Content to Test Scores.

Abdel-fattah, Abdel-fattah A. – 1992

A scaling procedure is proposed, based on item response theory (IRT), to fit non-hierarchical test structure as well. The binary scores of a test of English were used for calculating the probabilities of answering each item correctly. The probability matrix was factor analyzed, and the difficulty intervals or estimates corresponding to the factors…

Descriptors: Bayesian Statistics, Difficulty Level, English, Estimation (Mathematics)

Abdel-fattah, Abdel-fattah A.	1
Batchelder, William H.	1
Cai, Li	1
Chengcheng Li	1
Davey, Tim	1
France, Stephen L.	1
Frey, Andreas	1
Gerick, Julia	1
Goldhammer, Frank	1
Hartig, Johannes	1
Ihme, Jan Marten	1
Kane, Michael T.	1
Klieme, Eckhard	1
Kroopnick, Marc Howard	1
Lüdtke, Oliver	1
Meijer, Rob R.	1
Mroch, Andrew A.	1
Nold, Gunter	1
Ripkey, Douglas R.	1
Robitzsch, Alexander	1
Senkbeil, Martin	1
Sheehan, Robert	1
Sijtsma, Klaas	1
Snyder, Scott	1
More ▼