ERIC - Search Results

Publication Date

In 2025	0
Since 2024	0
Since 2021 (last 5 years)	0
Since 2016 (last 10 years)	3
Since 2006 (last 20 years)	8

Descriptor

Comparative Analysis	15
Evaluation Methods	15
Simulation	7
Item Response Theory	6
Test Items	5
Scores	4
Scoring	4
College Entrance Examinations	3
Models	3
Test Bias	3
Educational Assessment	2
Elementary Secondary Education	2
Evaluation Research	2
Factor Analysis	2
Item Analysis	2
Latent Trait Theory	2
Measurement	2
Psychometrics	2
Test Validity	2
Ability	1
Achievement Tests	1
Adaptive Testing	1
Analysis of Variance	1
Aptitude Tests	1
Basic Skills	1
More ▼

Source

Journal of Educational…

Publication Type

Journal Articles	15
Reports - Research	9
Reports - Evaluative	5
Reports - Descriptive	1

Education Level

Elementary Secondary Education	1
Secondary Education	1

Audience

Location

Laws, Policies, & Programs

Assessments and Surveys

SAT (College Admission Test)	2
National Assessment of…	1
Program for International…	1
Trends in International…	1

What Works Clearinghouse Rating

Showing all 15 results Save | Export

Scale Alignment in Between-Item Multidimensional Rasch Models

Peer reviewed

Direct link

Feuerstahler, Leah; Wilson, Mark – Journal of Educational Measurement, 2019

Scores estimated from multidimensional item response theory (IRT) models are not necessarily comparable across dimensions. In this article, the concept of aligned dimensions is formalized in the context of Rasch models, and two methods are described--delta dimensional alignment (DDA) and logistic regression alignment (LRA)--to transform estimated…

Descriptors: Item Response Theory, Models, Scores, Comparative Analysis

Classroom Assessment and Large-Scale Psychometrics: Shall the Twain Meet? (A Conversation with Margaret Heritage and Neal Kingston)

Peer reviewed

Direct link

Heritage, Margaret; Kingston, Neal M. – Journal of Educational Measurement, 2019

Classroom assessment and large-scale assessment have, for the most part, existed in mutual isolation. Some experts have felt this is for the best and others have been concerned that the schism limits the potential contribution of both forms of assessment. Margaret Heritage has long been a champion of best practices in classroom assessment. Neal…

Descriptors: Measurement, Psychometrics, Context Effect, Classroom Environment

Detection of Invalid Test Scores: The Usefulness of Simple Nonparametric Statistics

Peer reviewed

Direct link

Tendeiro, Jorge N.; Meijer, Rob R. – Journal of Educational Measurement, 2014

In recent guidelines for fair educational testing it is advised to check the validity of individual test scores through the use of person-fit statistics. For practitioners it is unclear on the basis of the existing literature which statistic to use. An overview of relatively simple existing nonparametric approaches to identify atypical response…

Descriptors: Educational Assessment, Test Validity, Scores, Statistical Analysis

Differential Item Functioning Assessment in Cognitive Diagnostic Modeling: Application of the Wald Test to Investigate DIF in the DINA Model

Peer reviewed

Direct link

Hou, Likun; de la Torre, Jimmy; Nandakumar, Ratna – Journal of Educational Measurement, 2014

Analyzing examinees' responses using cognitive diagnostic models (CDMs) has the advantage of providing diagnostic information. To ensure the validity of the results from these models, differential item functioning (DIF) in CDMs needs to be investigated. In this article, the Wald test is proposed to examine DIF in the context of CDMs. This study…

Descriptors: Test Bias, Models, Simulation, Error Patterns

A Comparison of Linking Methods for Estimating National Trends in International Comparative Large-Scale Assessments in the Presence of Cross-national DIF

Peer reviewed

Direct link

Sachse, Karoline A.; Roppelt, Alexander; Haag, Nicole – Journal of Educational Measurement, 2016

Trend estimation in international comparative large-scale assessments relies on measurement invariance between countries. However, cross-national differential item functioning (DIF) has been repeatedly documented. We ran a simulation study using national item parameters, which required trends to be computed separately for each country, to compare…

Descriptors: Comparative Analysis, Measurement, Test Bias, Simulation

A Comparative Study of IRT Fixed Parameter Calibration Methods

Peer reviewed

Direct link

Kim, Seonghoon – Journal of Educational Measurement, 2006

This article provides technical descriptions of five fixed parameter calibration (FPC) methods, which were based on marginal maximum likelihood estimation via the EM algorithm, and evaluates them through simulation. The five FPC methods described are distinguished from each other by how many times they update the prior ability distribution and by…

Descriptors: Comparative Analysis, Item Response Theory, Evaluation Methods, Computation

Confirmatory Factor Analysis of Multitrait Multimethod Matrices.

Peer reviewed

Marsh, Herbert W.; Hocevar, Dennis – Journal of Educational Measurement, 1983

This paper describes a variety of confirmatory factor analysis models that provide improved tests of multitrait-multimethod matrices, and compares three different approaches (the original Campbell-Fiske guidelines, an analysis of variance model, and confirmatory factor analysis models). (PN)

Descriptors: Analysis of Variance, Comparative Analysis, Evaluation Methods, Factor Analysis

An Application of Score Equity Assessment: Invariance of Linkage of New SAT[R] to Old SAT across Gender Groups

Peer reviewed

Direct link

Liu, Jinghua; Cahn, Miriam F.; Dorans, Neil J. – Journal of Educational Measurement, 2006

The College Board's SAT[R] data are used to illustrate how the score equity assessment (SEA) can help inform the program about equatability. SEA is used to examine whether the content change(s) to the revised new SAT result in differential linking functions across gender groups. Results of population sensitivity analyses are reported on the…

Descriptors: Aptitude Tests, Comparative Analysis, Gender Differences, Scores

Assessing Dimensionality of a Set of Item Responses--Comparison of Different Approaches.

Peer reviewed

Nandakumar, Ratna – Journal of Educational Measurement, 1994

Using simulated and real data, this study compares the performance of three methodologies for assessing unidimensionality: (1) DIMTEST; (2) the approach of Holland and Rosenbaum; and (3) nonlinear factor analysis. All three models correctly confirm unidimensionality, but they differ in their ability to detect the lack of unidimensionality.…

Descriptors: Ability, Comparative Analysis, Evaluation Methods, Factor Analysis

Comparing Methods of Assessing Differential Item Functioning in a Computerized Adaptive Testing Environment

Peer reviewed

Direct link

Lei, Pui-Wa; Chen, Shu-Ying; Yu, Lan – Journal of Educational Measurement, 2006

Mantel-Haenszel and SIBTEST, which have known difficulty in detecting non-unidirectional differential item functioning (DIF), have been adapted with some success for computerized adaptive testing (CAT). This study adapts logistic regression (LR) and the item-response-theory-likelihood-ratio test (IRT-LRT), capable of detecting both unidirectional…

Descriptors: Evaluation Methods, Test Bias, Computer Assisted Testing, Multiple Regression Analysis

Evaluation of Procedure-Based Scoring for Hands-On Science Assessment.

Peer reviewed

Baxter, Gail P.; And Others – Journal of Educational Measurement, 1992

A procedure-based observational scoring system and a notebook completed by students were evaluated as science assessments for 41 fifth grade students experienced in hands-on science and 55 fifth grade students inexperienced in hands-on science. Results suggest that notebooks may be a reasonable, although less reliable, surrogate for observed…

Descriptors: Classroom Observation Techniques, Comparative Analysis, Educational Assessment, Elementary School Students

Comparison of NOHARM and DETECT in Item Cluster Recovery: Counting Dimensions and Allocating Items

Peer reviewed

Direct link

Finch, Holmes; Habing, Brian – Journal of Educational Measurement, 2005

This study examines the performance of a new method for assessing and characterizing dimensionality in test data using the NOHARM model, and comparing it with DETECT. Dimensionality assessment is carried out using two goodness-of-fit statistics that are compared to reference X[2] distributions. A Monte Carlo study is used with item parameters…

Descriptors: Program Effectiveness, Monte Carlo Methods, Item Response Theory, Comparative Analysis

A Comparison of Procedures to Assess Written Language Skills at Grades 4, 7, and 10.

Peer reviewed

Moss, Pamela A.; And Others – Journal of Educational Measurement, 1982

Scores on a multiple-choice language test involving recognition of language errors were related to those on writing samples, scored atomistically for the same language errors and holistically for communicative effectiveness and correctness. Results suggest the need for clear limits in generalizing from one assessment to others. (Author/GK)

Descriptors: Comparative Analysis, Elementary Secondary Education, Evaluation Methods, Grade 10

Equating Minimum-Competency Tests: Comparison of Methods.

Peer reviewed

Hills, John R.; And Others – Journal of Educational Measurement, 1988

Five methods of equating minimum-competency tests were compared using the Florida Statewide Student Assessment Test, Part II, for 1984 and 1986. Four of five methods yielded essentially comparable results for the highest scoring 84% of the students. Different lengths of anchor items were compared, using the concurrent item response theory equating…

Descriptors: Comparative Analysis, Equated Scores, Evaluation Methods, Graduation Requirements

A Comparison of Six Methods for Combining Multiple IRT Item Parameter Estimates.

Peer reviewed

McKinley, Robert L. – Journal of Educational Measurement, 1988

Six procedures for combining sets of item response theory (IRT) item parameter estimates from different samples were evaluated using real and simulated response data. Results support use of covariance matrix-weighted averaging and a procedure using sample-size-weighted averaging of estimated item characteristic curves at the center of the ability…

Descriptors: College Entrance Examinations, Comparative Analysis, Computer Simulation, Estimation (Mathematics)

Nandakumar, Ratna	2
Baxter, Gail P.	1
Cahn, Miriam F.	1
Chen, Shu-Ying	1
Dorans, Neil J.	1
Feuerstahler, Leah	1
Finch, Holmes	1
Haag, Nicole	1
Habing, Brian	1
Heritage, Margaret	1
Hills, John R.	1
Hocevar, Dennis	1
Hou, Likun	1
Kim, Seonghoon	1
Kingston, Neal M.	1
Lei, Pui-Wa	1
Liu, Jinghua	1
Marsh, Herbert W.	1
McKinley, Robert L.	1
Meijer, Rob R.	1
Moss, Pamela A.	1
Roppelt, Alexander	1
Sachse, Karoline A.	1
Tendeiro, Jorge N.	1
More ▼