ERIC - Search Results

Publication Date

In 2025	0
Since 2024	0
Since 2021 (last 5 years)	1
Since 2016 (last 10 years)	5
Since 2006 (last 20 years)	10

Descriptor

Classification	12
Evaluation Methods	12
Computation	3
Models	3
Statistical Analysis	3
Test Items	3
Comparative Analysis	2
Computer Assisted Testing	2
Data Analysis	2
Discriminant Analysis	2
Factor Analysis	2
Goodness of Fit	2
Measures (Individuals)	2
Monte Carlo Methods	2
Sample Size	2
Scores	2
Scoring	2
Simulation	2
Test Construction	2
Accuracy	1
Arabs	1
Bayesian Statistics	1
Behavioral Science Research	1
College Students	1
Cutting Scores	1
More ▼

Source

Educational and Psychological…

Publication Type

Journal Articles	12
Reports - Research	10
Reports - Evaluative	2

Education Level

Audience

Location

Israel

Laws, Policies, & Programs

Assessments and Surveys

Bem Sex Role Inventory

What Works Clearinghouse Rating

Showing all 12 results Save | Export

Evaluating the Quality of Classification in Mixture Model Simulations

Peer reviewed

Direct link

Jang, Yoona; Hong, Sehee – Educational and Psychological Measurement, 2023

The purpose of this study was to evaluate the degree of classification quality in the basic latent class model when covariates are either included or are not included in the model. To accomplish this task, Monte Carlo simulations were conducted in which the results of models with and without a covariate were compared. Based on these simulations,…

Descriptors: Classification, Models, Prediction, Sample Size

Evaluation of Measurement Instrument Criterion Validity in Finite Mixture Settings

Peer reviewed

Direct link

Raykov, Tenko; Marcoulides, George A.; Li, Tenglong – Educational and Psychological Measurement, 2016

A method for evaluating the validity of multicomponent measurement instruments in heterogeneous populations is discussed. The procedure can be used for point and interval estimation of criterion validity of linear composites in populations representing mixtures of an unknown number of latent classes. The approach permits also the evaluation of…

Descriptors: Validity, Measures (Individuals), Classification, Evaluation Methods

Categorical Omega with Small Sample Sizes via Bayesian Estimation: An Alternative to Frequentist Estimators

Peer reviewed

Direct link

Yang, Yanyun; Xia, Yan – Educational and Psychological Measurement, 2019

When item scores are ordered categorical, categorical omega can be computed based on the parameter estimates from a factor analysis model using frequentist estimators such as diagonally weighted least squares. When the sample size is relatively small and thresholds are different across items, using diagonally weighted least squares can yield a…

Descriptors: Scores, Sample Size, Bayesian Statistics, Item Analysis

An Unbiased Estimate of Global Interrater Agreement

Peer reviewed

Direct link

Cousineau, Denis; Laurencelle, Louis – Educational and Psychological Measurement, 2017

Assessing global interrater agreement is difficult as most published indices are affected by the presence of mixtures of agreements and disagreements. A previously proposed method was shown to be specifically sensitive to global agreement, excluding mixtures, but also negatively biased. Here, we propose two alternatives in an attempt to find what…

Descriptors: Interrater Reliability, Evaluation Methods, Statistical Bias, Accuracy

Investigation of Rater Effects Using Social Network Analysis and Exponential Random Graph Models

Peer reviewed

Direct link

Lamprianou, Iasonas – Educational and Psychological Measurement, 2018

It is common practice for assessment programs to organize qualifying sessions during which the raters (often known as "markers" or "judges") demonstrate their consistency before operational rating commences. Because of the high-stakes nature of many rating activities, the research community tends to continuously explore new…

Descriptors: Social Networks, Network Analysis, Comparative Analysis, Innovation

The Legacies of R. A. Fisher and K. Pearson in the Application of the Polytomous Rasch Model for Assessing the Empirical Ordering of Categories

Peer reviewed

Direct link

Andrich, David – Educational and Psychological Measurement, 2013

Assessments in response formats with ordered categories are ubiquitous in the social and health sciences. Although the assumption that the ordering of the categories is working as intended is central to any interpretation that arises from such assessments, testing that this assumption is valid is not standard in psychometrics. This is surprising…

Descriptors: Item Response Theory, Classification, Statistical Analysis, Models

Comparison between Dichotomous and Polytomous Scoring of Innovative Items in a Large-Scale Computerized Adaptive Test

Peer reviewed

Direct link

Jiao, Hong; Liu, Junhui; Haynie, Kathleen; Woo, Ada; Gorham, Jerry – Educational and Psychological Measurement, 2012

This study explored the impact of partial credit scoring of one type of innovative items (multiple-response items) in a computerized adaptive version of a large-scale licensure pretest and operational test settings. The impacts of partial credit scoring on the estimation of the ability parameters and classification decisions in operational test…

Descriptors: Test Items, Computer Assisted Testing, Measures (Individuals), Scoring

Polytomous Adaptive Classification Testing: Effects of Item Pool Size, Test Termination Criterion, and Number of Cutscores

Peer reviewed

Direct link

Gnambs, Timo; Batinic, Bernad – Educational and Psychological Measurement, 2011

Computer-adaptive classification tests focus on classifying respondents in different proficiency groups (e.g., for pass/fail decisions). To date, adaptive classification testing has been dominated by research on dichotomous response formats and classifications in two groups. This article extends this line of research to polytomous classification…

Descriptors: Test Length, Computer Assisted Testing, Classification, Test Items

The Effects of Initially Misclassified Data on the Effectiveness of Discriminant Function Analysis and Finite Mixture Modeling

Peer reviewed

Direct link

Holden, Jocelyn E.; Kelley, Ken – Educational and Psychological Measurement, 2010

Classification procedures are common and useful in behavioral, educational, social, and managerial research. Supervised classification techniques such as discriminant function analysis assume training data are perfectly classified when estimating parameters or classifying. In contrast, unsupervised classification techniques such as finite mixture…

Descriptors: Discriminant Analysis, Classification, Computation, Behavioral Science Research

The Bem Sex-Role Inventory: Continuing Theoretical Problems

Peer reviewed

Direct link

Choi, Namok; Fuqua, Dale R.; Newman, Jody L. – Educational and Psychological Measurement, 2008

Pedhazur and Tetenbaum speculated that factor structures from self-ratings of the Bem Sex-Role Inventory (BSRI) personality traits would be different from factor structures from desirability ratings of the same traits. To explore this hypothesis, both desirability ratings of BSRI traits (both "for a man" and "for a woman") and…

Descriptors: Personality Traits, Sex Role, Gender Discrimination, Self Evaluation (Individuals)

The Effect of Grouped versus Randomized Questionnaire Format on Scale Reliability and Validity: A Three-Study Investigation.

Peer reviewed

Schriesheim, Chester A.; And Others – Educational and Psychological Measurement, 1989

Three studies explored the effects of grouping versus randomized items in questionnaires on internal consistency and test-retest reliability with samples of 80, 80, and 100, respectively, university students and undergraduates. The 2 correlational and 1 experimental studies were reasonably consistent in demonstrating that neither format was…

Descriptors: Classification, College Students, Evaluation Methods, Higher Education

On Agreement of Diagnostic Classifications from Parallel Subtests: Score Reliability at the Micro Level.

Peer reviewed

And Others; Birenbaum, Menucha – Educational and Psychological Measurement, 1997

The agreement of diagnostic classifications from two parallel subtests assessing a mathematics skill with three levels of scoring was studied with 431 Arab Israeli 10th graders. Results indicate that, even when parallel form reliability is high, less agreement is apparent when performance is evaluated at the micro level. (SLD)

Descriptors: Arabs, Classification, Diagnostic Tests, Evaluation Methods

Andrich, David	1
Batinic, Bernad	1
Birenbaum, Menucha	1
Choi, Namok	1
Cousineau, Denis	1
Fuqua, Dale R.	1
Gnambs, Timo	1
Gorham, Jerry	1
Haynie, Kathleen	1
Holden, Jocelyn E.	1
Hong, Sehee	1
Jang, Yoona	1
Jiao, Hong	1
Kelley, Ken	1
Lamprianou, Iasonas	1
Laurencelle, Louis	1
Li, Tenglong	1
Liu, Junhui	1
Marcoulides, George A.	1
Newman, Jody L.	1
Raykov, Tenko	1
Schriesheim, Chester A.	1
Woo, Ada	1
Xia, Yan	1
More ▼