ERIC - Search Results

Publication Date

In 2025	0
Since 2024	2
Since 2021 (last 5 years)	2
Since 2016 (last 10 years)	6
Since 2006 (last 20 years)	15

Descriptor

Error of Measurement	16
Measurement	16
Test Items	16
Item Response Theory	9
Computation	6
Simulation	6
Comparative Analysis	4
Difficulty Level	4
Foreign Countries	4
Mathematics Achievement	4
Models	4
Scores	4
Achievement Tests	3
Educational Assessment	3
Elementary Secondary Education	3
Evaluation Methods	3
Mathematics Tests	3
Psychometrics	3
Sampling	3
Academic Achievement	2
College Students	2
Computer Assisted Testing	2
Correlation	2
Data Analysis	2
Elementary School Students	2
More ▼

Source

ProQuest LLC	2
American Institutes for…	1
Applied Measurement in…	1
Assessment & Evaluation in…	1
Educational and Psychological…	1
Grantee Submission	1
International Journal of…	1
Journal of Educational…	1
Language Assessment Quarterly	1
Multivariate Behavioral…	1
Online Submission	1
Partnership for Assessment of…	1
Psychometrika	1
Research Matters	1
Society for Research on…	1
More ▼

Publication Type

Journal Articles	9
Reports - Research	9
Reports - Evaluative	4
Dissertations/Theses -…	2
Numerical/Quantitative Data	1
Reports - Descriptive	1
Tests/Questionnaires	1

Education Level

Elementary Secondary Education	4
Elementary Education	3
Higher Education	3
Postsecondary Education	3
Junior High Schools	2
Secondary Education	2
Grade 8	1
Middle Schools	1

Audience

Location

Colorado (Boulder)	1
Japan	1
Philippines	1
Portugal	1

Laws, Policies, & Programs

No Child Left Behind Act 2001	1
Race to the Top	1

Assessments and Surveys

National Assessment of…	1
Program for International…	1
Trends in International…	1

What Works Clearinghouse Rating

Showing 1 to 15 of 16 results Save | Export

Multi-Group Regularized Gaussian Variational Estimation: Fast Detection of DIF

Peer reviewed

Direct link

Weicong Lyu; Chun Wang; Gongjun Xu – Grantee Submission, 2024

Data harmonization is an emerging approach to strategically combining data from multiple independent studies, enabling addressing new research questions that are not answerable by a single contributing study. A fundamental psychometric challenge for data harmonization is to create commensurate measures for the constructs of interest across…

Descriptors: Data Analysis, Test Items, Psychometrics, Item Response Theory

Modeling Computational Thinking Using Multidimensional Item Response Theory: Investigation into Model Fit and Measurement Invariance

Direct link

Emily A. Brown – ProQuest LLC, 2024

Previous research has been limited regarding the measurement of computational thinking, particularly as a learning progression in K-12. This study proposes to apply a multidimensional item response theory (IRT) model to a newly developed measure of computational thinking utilizing both selected response and open-ended polytomous items to establish…

Descriptors: Models, Computation, Thinking Skills, Item Response Theory

Comparing Small-Sample Equating with Angoff Judgement for Linking Cut-Scores on Two Tests

Download full text

Bramley, Tom – Research Matters, 2020

The aim of this study was to compare, by simulation, the accuracy of mapping a cut-score from one test to another by expert judgement (using the Angoff method) versus the accuracy with a small-sample equating method (chained linear equating). As expected, the standard-setting method resulted in more accurate equating when we assumed a higher level…

Descriptors: Cutting Scores, Standard Setting (Scoring), Equated Scores, Accuracy

High-Dimensional Explanatory Random Item Effects Models for Rater-Mediated Assessments

Peer reviewed
PDF on ERIC

Download full text

Kelcey, Ben; Wang, Shanshan; Cox, Kyle – Society for Research on Educational Effectiveness, 2016

Valid and reliable measurement of unobserved latent variables is essential to understanding and improving education. A common and persistent approach to assessing latent constructs in education is the use of rater inferential judgment. The purpose of this study is to develop high-dimensional explanatory random item effects models designed for…

Descriptors: Test Items, Models, Evaluators, Longitudinal Studies

Selection of Common Items as an Unrecognized Source of Variability in Test Equating: A Bootstrap Approximation Assuming Random Sampling of Common Items

Peer reviewed

Direct link

Michaelides, Michalis P.; Haertel, Edward H. – Applied Measurement in Education, 2014

The standard error of equating quantifies the variability in the estimation of an equating function. Because common items for deriving equated scores are treated as fixed, the only source of variability typically considered arises from the estimation of common-item parameters from responses of samples of examinees. Use of alternative, equally…

Descriptors: Equated Scores, Test Items, Sampling, Statistical Inference

Guessing and the Rasch Model

Peer reviewed

Direct link

Holster, Trevor A.; Lake, J. – Language Assessment Quarterly, 2016

Stewart questioned Beglar's use of Rasch analysis of the Vocabulary Size Test (VST) and advocated the use of 3-parameter logistic item response theory (3PLIRT) on the basis that it models a non-zero lower asymptote for items, often called a "guessing" parameter. In support of this theory, Stewart presented fit statistics derived from…

Descriptors: Guessing (Tests), Item Response Theory, Vocabulary, Language Tests

A Comparison of Linking Methods for Estimating National Trends in International Comparative Large-Scale Assessments in the Presence of Cross-national DIF

Peer reviewed

Direct link

Sachse, Karoline A.; Roppelt, Alexander; Haag, Nicole – Journal of Educational Measurement, 2016

Trend estimation in international comparative large-scale assessments relies on measurement invariance between countries. However, cross-national differential item functioning (DIF) has been repeatedly documented. We ran a simulation study using national item parameters, which required trends to be computed separately for each country, to compare…

Descriptors: Comparative Analysis, Measurement, Test Bias, Simulation

Test Length and Decision Quality in Personnel Selection: When Is Short Too Short?

Peer reviewed

Direct link

Kruyen, Peter M.; Emons, Wilco H. M.; Sijtsma, Klaas – International Journal of Testing, 2012

Personnel selection shows an enduring need for short stand-alone tests consisting of, say, 5 to 15 items. Despite their efficiency, short tests are more vulnerable to measurement error than longer test versions. Consequently, the question arises to what extent reducing test length deteriorates decision quality due to increased impact of…

Descriptors: Measurement, Personnel Selection, Decision Making, Error of Measurement

Improving Explanatory Inferences from Assessments

Direct link

Diakow, Ronli Phyllis – ProQuest LLC, 2013

This dissertation comprises three papers that propose, discuss, and illustrate models to make improved inferences about research questions regarding student achievement in education. Addressing the types of questions common in educational research today requires three different "extensions" to traditional educational assessment: (1)…

Descriptors: Inferences, Educational Assessment, Academic Achievement, Educational Research

Online Calibration via Variable Length Computerized Adaptive Testing

Peer reviewed

Direct link

Chang, Yuan-chin Ivan; Lu, Hung-Yi – Psychometrika, 2010

Item calibration is an essential issue in modern item response theory based psychological or educational testing. Due to the popularity of computerized adaptive testing, methods to efficiently calibrate new items have become more important than that in the time when paper and pencil test administration is the norm. There are many calibration…

Descriptors: Test Items, Educational Testing, Adaptive Testing, Measurement

IRT-LR-DIF with Estimation of the Focal-Group Density as an Empirical Histogram

Peer reviewed

Direct link

Woods, Carol M. – Educational and Psychological Measurement, 2008

Item response theory-likelihood ratio-differential item functioning (IRT-LR-DIF) is used to evaluate the degree to which items on a test or questionnaire have different measurement properties for one group of people versus another, irrespective of group-mean differences on the construct. Usually, the latent distribution is presumed normal for both…

Descriptors: Simulation, Computation, Item Response Theory, Test Items

Demonstrating the Difference between Classical Test Theory and Item Response Theory Using Derived Test Data

Download full text

Magno, Carlo – Online Submission, 2009

The present report demonstrates the difference between classical test theory (CTT) and item response theory (IRT) approach using an actual test data for chemistry junior high school students. The CTT and IRT were compared across two samples and two forms of test on their item difficulty, internal consistency, and measurement errors. The specific…

Descriptors: Private Schools, Measurement, Error of Measurement, Foreign Countries

Making Inferences about Growth and Value-Added: Design Issues for the PARCC Consortium. A White Paper

Download full text

Briggs, Derek C. – Partnership for Assessment of Readiness for College and Careers, 2011

There is often confusion about distinctions between growth models and value-added models. The first half of this paper attempts to dispel some of these confusions by clarifying terminology and illustrating by example how the results from a large-scale assessment can and will be used to make inferences about student growth and the value-added…

Descriptors: Value Added Models, Language Usage, Measurement, Inferences

A Multilevel Nonlinear Profile Analysis Model for Dichotomous Data

Peer reviewed

Direct link

Culpepper, Steven Andrew – Multivariate Behavioral Research, 2009

This study linked nonlinear profile analysis (NPA) of dichotomous responses with an existing family of item response theory models and generalized latent variable models (GLVM). The NPA method offers several benefits over previous internal profile analysis methods: (a) NPA is estimated with maximum likelihood in a GLVM framework rather than…

Descriptors: Profiles, Item Response Theory, Models, Maximum Likelihood Statistics

E-Assessment within the Bologna Paradigm: Evidence from Portugal

Peer reviewed

Direct link

Ferrao, Maria – Assessment & Evaluation in Higher Education, 2010

The Bologna Declaration brought reforms into higher education that imply changes in teaching methods, didactic materials and textbooks, infrastructures and laboratories, etc. Statistics and mathematics are disciplines that traditionally have the worst success rates, particularly in non-mathematics core curricula courses. This research project,…

Descriptors: Foreign Countries, Computer Assisted Testing, Educational Technology, Educational Assessment

Previous Page | Next Page »

Pages: 1 | 2

Blankenship, Charles D.	1
Bramley, Tom	1
Briggs, Derek C.	1
Chang, Yuan-chin Ivan	1
Chun Wang	1
Cox, Kyle	1
Culpepper, Steven Andrew	1
Diakow, Ronli Phyllis	1
Emily A. Brown	1
Emons, Wilco H. M.	1
Ferrao, Maria	1
Gongjun Xu	1
Haag, Nicole	1
Haertel, Edward H.	1
Holster, Trevor A.	1
Kelcey, Ben	1
Kruyen, Peter M.	1
Lake, J.	1
Lu, Hung-Yi	1
Magno, Carlo	1
McLaughlin, Donald H.	1
Michaelides, Michalis P.	1
Roppelt, Alexander	1
Sachse, Karoline A.	1
Scarloss, Beth A.	1
More ▼