ERIC - Search Results

Publication Date

In 2025	0
Since 2024	0
Since 2021 (last 5 years)	2
Since 2016 (last 10 years)	4
Since 2006 (last 20 years)	10

Descriptor

Classification	12
Error of Measurement	12
Test Items	12
Item Analysis	5
Item Response Theory	4
Accuracy	3
Comparative Analysis	3
Difficulty Level	3
Goodness of Fit	3
Sample Size	3
Statistical Analysis	3
Computation	2
Correlation	2
Effect Size	2
Evaluation Methods	2
Factor Analysis	2
Foreign Countries	2
Models	2
Nonparametric Statistics	2
Psychometrics	2
Regression (Statistics)	2
Reliability	2
Simulation	2
Statistical Bias	2
Test Bias	2
More ▼

Source

Educational and Psychological…	3
Applied Measurement in…	1
International Journal of…	1
Journal of Educational…	1
Online Submission	1
ProQuest LLC	1
Psychological Methods	1
Research Papers in Education	1
Teachers College Record	1

Publication Type

Journal Articles	9
Reports - Research	7
Reports - Evaluative	4
Speeches/Meeting Papers	2
Dissertations/Theses -…	1

Education Level

Elementary Education	1
Secondary Education	1

Audience

Researchers

Location

United Kingdom (England)

Laws, Policies, & Programs

Assessments and Surveys

Program for International…

What Works Clearinghouse Rating

Showing all 12 results Save | Export

Evaluation of the Goodness-of-Fit Index M[subscript ord] in Polytomous DCMS with Hierarchical Attribute Structures

Direct link

Haimiao Yuan – ProQuest LLC, 2022

The application of diagnostic classification models (DCMs) in the field of educational measurement is getting more attention in recent years. To make a valid inference from the model, it is important to ensure that the model fits the data. The purpose of the present study was to investigate the performance of the limited information…

Descriptors: Goodness of Fit, Educational Assessment, Educational Diagnosis, Models

Comparing the Robustness of Three Nonparametric DIF Procedures to Differential Rapid Guessing

Peer reviewed

Direct link

Abulela, Mohammed A. A.; Rios, Joseph A. – Applied Measurement in Education, 2022

When there are no personal consequences associated with test performance for examinees, rapid guessing (RG) is a concern and can differ between subgroups. To date, the impact of differential RG on item-level measurement invariance has received minimal attention. To that end, a simulation study was conducted to examine the robustness of the…

Descriptors: Comparative Analysis, Robustness (Statistics), Nonparametric Statistics, Item Analysis

Item Parameter Drift in Computer Adaptive Testing Due to Lack of Content Knowledge

Peer reviewed

Direct link

Aksu Dunya, Beyza – International Journal of Testing, 2018

This study was conducted to analyze potential item parameter drift (IPD) impact on person ability estimates and classification accuracy when drift affects an examinee subgroup. Using a series of simulations, three factors were manipulated: (a) percentage of IPD items in the CAT exam, (b) percentage of examinees affected by IPD, and (c) item pool…

Descriptors: Adaptive Testing, Classification, Accuracy, Computer Assisted Testing

Fitting Large Factor Analysis Models with Ordinal Data

Peer reviewed

Direct link

DiStefano, Christine; McDaniel, Heather L.; Zhang, Liyun; Shi, Dexin; Jiang, Zhehan – Educational and Psychological Measurement, 2019

A simulation study was conducted to investigate the model size effect when confirmatory factor analysis (CFA) models include many ordinal items. CFA models including between 15 and 120 ordinal items were analyzed with mean- and variance-adjusted weighted least squares to determine how varying sample size, number of ordered categories, and…

Descriptors: Factor Analysis, Effect Size, Data, Sample Size

Effectiveness of Combining Statistical Tests and Effect Sizes When Using Logistic Discriminant Function Regression to Detect Differential Item Functioning for Polytomous Items

Peer reviewed

Direct link

Gómez-Benito, Juana; Hidalgo, Maria Dolores; Zumbo, Bruno D. – Educational and Psychological Measurement, 2013

The objective of this article was to find an optimal decision rule for identifying polytomous items with large or moderate amounts of differential functioning. The effectiveness of combining statistical tests with effect size measures was assessed using logistic discriminant function analysis and two effect size measures: R[superscript 2] and…

Descriptors: Item Analysis, Test Items, Effect Size, Statistical Analysis

An Investigation of Measurement Invariance of the Key Stage 2 National Curriculum Science Sampling Test in England

Peer reviewed

Direct link

He, Qingping; Anwyll, Steve; Glanville, Matthew; Opposs, Dennis – Research Papers in Education, 2014

Since 2010, the whole national cohort Key Stage 2 (KS2) National Curriculum test in science in England has been replaced with a sampling test taken by pupils at the age of 11 from a nationally representative sample of schools annually. The study reported in this paper compares the performance of different subgroups of the samples (classified by…

Descriptors: National Curriculum, Sampling, Foreign Countries, Factor Analysis

A Multilevel Testlet Model for Dual Local Dependence

Peer reviewed

Direct link

Jiao, Hong; Kamata, Akihito; Wang, Shudong; Jin, Ying – Journal of Educational Measurement, 2012

The applications of item response theory (IRT) models assume local item independence and that examinees are independent of each other. When a representative sample for psychometric analysis is selected using a cluster sampling method in a testlet-based assessment, both local item dependence and local person dependence are likely to be induced.…

Descriptors: Item Response Theory, Test Items, Markov Processes, Monte Carlo Methods

DIF Trees: Using Classification Trees to Detect Differential Item Functioning

Peer reviewed

Direct link

Vaughn, Brandon K.; Wang, Qiu – Educational and Psychological Measurement, 2010

A nonparametric tree classification procedure is used to detect differential item functioning for items that are dichotomously scored. Classification trees are shown to be an alternative procedure to detect differential item functioning other than the use of traditional Mantel-Haenszel and logistic regression analysis. A nonparametric…

Descriptors: Test Bias, Classification, Nonparametric Statistics, Regression (Statistics)

On the Consistency of Individual Classification Using Short Scales

Peer reviewed

Direct link

Emons, Wilco H. M.; Sijtsma, Klaas; Meijer, Rob R. – Psychological Methods, 2007

Short tests containing at most 15 items are used in clinical and health psychology, medicine, and psychiatry for making decisions about patients. Because short tests have large measurement error, the authors ask whether they are reliable enough for classifying patients into a treatment and a nontreatment group. For a given certainty level,…

Descriptors: Psychiatry, Patients, Error of Measurement, Test Length

Evaluation of Linking Methods for Placing Three-Parameter Logistic Item Parameter Estimates onto a One-Parameter Scale

Download full text

Karkee, Thakur B.; Wright, Karen R. – Online Submission, 2004

Different item response theory (IRT) models may be employed for item calibration. Change of testing vendors, for example, may result in the adoption of a different model than that previously used with a testing program. To provide scale continuity and preserve cut score integrity, item parameter estimates from the new model must be linked to the…

Descriptors: Measures (Individuals), Evaluation Criteria, Testing, Integrity

Rasch Analysis Using SPSS.

Phillips, Gary W. – 1983

Ways in which the Statistical Package for the Social Sciences (SPSS) can be used to perform some Rasch analyses are described in detail. It is shown how SPSS and a set of item calibrations can be used to estimate person abilities, standard errors of measurement, test characteristic curve, test information curve, classification consistency on a…

Descriptors: Classification, Computer Software, Error of Measurement, Estimation (Mathematics)

Psychometric Issues in the ELL Assessment and Special Education Eligibility

Peer reviewed

Direct link

Abedi, Jamal – Teachers College Record, 2006

Assessments in English that are constructed for native English speakers may not provide valid inferences about the achievement of English language learners (ELLs). The linguistic complexity of the test items that are not related to the content of the assessment may increase the measurement error, thus reducing the reliability of the assessment.…

Descriptors: Second Language Learning, Test Items, Psychometrics, Inferences

Abedi, Jamal	1
Abulela, Mohammed A. A.	1
Aksu Dunya, Beyza	1
Anwyll, Steve	1
DiStefano, Christine	1
Emons, Wilco H. M.	1
Glanville, Matthew	1
Gómez-Benito, Juana	1
Haimiao Yuan	1
He, Qingping	1
Hidalgo, Maria Dolores	1
Jiang, Zhehan	1
Jiao, Hong	1
Jin, Ying	1
Kamata, Akihito	1
Karkee, Thakur B.	1
McDaniel, Heather L.	1
Meijer, Rob R.	1
Opposs, Dennis	1
Phillips, Gary W.	1
Rios, Joseph A.	1
Shi, Dexin	1
Sijtsma, Klaas	1
Vaughn, Brandon K.	1
Wang, Qiu	1
More ▼