ERIC - Search Results

Publication Date

In 2025	0
Since 2024	0
Since 2021 (last 5 years)	3
Since 2016 (last 10 years)	25
Since 2006 (last 20 years)	46

Descriptor

Statistical Analysis	71
Test Length	71
Item Response Theory	30
Test Items	29
Sample Size	28
Comparative Analysis	16
Test Reliability	16
Correlation	15
Error of Measurement	13
Scores	13
Computation	12
Simulation	12
Models	11
Goodness of Fit	9
Test Bias	9
Computer Assisted Testing	8
Difficulty Level	8
Foreign Countries	8
Mathematical Models	8
Adaptive Testing	7
Classification	7
Equated Scores	7
Item Analysis	7
Accuracy	6
Sampling	6
More ▼

Publication Type

Reports - Research	55
Journal Articles	47
Reports - Evaluative	9
Speeches/Meeting Papers	5
Dissertations/Theses -…	3
Tests/Questionnaires	2
Information Analyses	1
Numerical/Quantitative Data	1

Education Level

Higher Education	5
Postsecondary Education	4
Secondary Education	3
Elementary Education	1
Elementary Secondary Education	1
Grade 3	1
High Schools	1

Audience

Researchers

Location

Canada	2
Netherlands	2
Turkey	2
Colombia	1
Indonesia	1
Jordan	1
Michigan	1
Peru	1
Qatar	1
Taiwan	1

Laws, Policies, & Programs

Assessments and Surveys

SAT (College Admission Test)	2
California Psychological…	1
Program for International…	1
Stanford Binet Intelligence…	1
Test of English as a Foreign…	1
Wechsler Adult Intelligence…	1
Wechsler Individual…	1

What Works Clearinghouse Rating

Showing 1 to 15 of 71 results Save | Export

The Comparison of Estimation Methods for the Four-Parameter Logistic Item Response Theory Model

Peer reviewed

Direct link

Kalkan, Ömür Kaya – Measurement: Interdisciplinary Research and Perspectives, 2022

The four-parameter logistic (4PL) Item Response Theory (IRT) model has recently been reconsidered in the literature due to the advances in the statistical modeling software and the recent developments in the estimation of the 4PL IRT model parameters. The current simulation study evaluated the performance of expectation-maximization (EM),…

Descriptors: Comparative Analysis, Sample Size, Test Length, Algorithms

Evaluation of the Goodness-of-Fit Index M[subscript ord] in Polytomous DCMS with Hierarchical Attribute Structures

Direct link

Haimiao Yuan – ProQuest LLC, 2022

The application of diagnostic classification models (DCMs) in the field of educational measurement is getting more attention in recent years. To make a valid inference from the model, it is important to ensure that the model fits the data. The purpose of the present study was to investigate the performance of the limited information…

Descriptors: Goodness of Fit, Educational Assessment, Educational Diagnosis, Models

Performance of the S-X[superscript 2] Statistic for the Multidimensional Graded Response Model

Peer reviewed

Direct link

Su, Shiyang; Wang, Chun; Weiss, David J. – Educational and Psychological Measurement, 2021

S-X[superscript 2] is a popular item fit index that is available in commercial software packages such as "flex"MIRT. However, no research has systematically examined the performance of S-X[superscript 2] for detecting item misfit within the context of the multidimensional graded response model (MGRM). The primary goal of this study was…

Descriptors: Statistics, Goodness of Fit, Test Items, Models

Applying a Multiple Comparison Control to IRT Item-Fit Testing

Peer reviewed

Direct link

Sauder, Derek; DeMars, Christine – Applied Measurement in Education, 2020

We used simulation techniques to assess the item-level and familywise Type I error control and power of an IRT item-fit statistic, the "S-X"[superscript 2]. Previous research indicated that the "S-X"[superscript 2] has good Type I error control and decent power, but no previous research examined familywise Type I error control.…

Descriptors: Item Response Theory, Test Items, Sample Size, Test Length

Subscore Equating and Profile Reporting

Peer reviewed

Direct link

Lim, Euijin; Lee, Won-Chan – Applied Measurement in Education, 2020

The purpose of this study is to address the necessity of subscore equating and to evaluate the performance of various equating methods for subtests. Assuming the random groups design and number-correct scoring, this paper analyzed real data and simulated data with four study factors including test dimensionality, subtest length, form difference in…

Descriptors: Equated Scores, Test Length, Test Format, Difficulty Level

A Note on Using Weighted Sum Scores in the P-DIF Statistic. Research Report. ETS RR-19-32

Peer reviewed
PDF on ERIC

Download full text

Guo, Hongwen; Dorans, Neil J. – ETS Research Report Series, 2019

The Mantel-Haenszel delta difference (MH D-DIF) and the standardized proportion difference (STD P-DIF) are two observed-score methods that have been used to assess differential item functioning (DIF) at Educational Testing Service since the early 1990s. Latentvariable approaches to assessing measurement invariance at the item level have been…

Descriptors: Test Bias, Educational Testing, Statistical Analysis, Item Response Theory

The Performance of the Semigeneralized Partial Credit Model for Handling Item-Level Missingness

Peer reviewed

Direct link

Zhou, Sherry; Huggins-Manley, Anne Corinne – Educational and Psychological Measurement, 2020

The semi-generalized partial credit model (Semi-GPCM) has been proposed as a unidimensional modeling method for handling not applicable scale responses and neutral scale responses, and it has been suggested that the model may be of use in handling missing data in scale items. The purpose of this study is to evaluate the ability of the…

Descriptors: Models, Statistical Analysis, Response Style (Tests), Test Items

Effects of Test Level Discrimination and Difficulty on Answer-Copying Indices

Peer reviewed
PDF on ERIC

Download full text

Sunbul, Onder; Yormaz, Seha – International Journal of Evaluation and Research in Education, 2018

In this study Type I Error and the power rates of omega (?) and GBT (generalized binomial test) indices were investigated for several nominal alpha levels and for 40 and 80-item test lengths with 10,000-examinee sample size under several test level restrictions. As a result, Type I error rates of both indices were found to be below the acceptable…

Descriptors: Difficulty Level, Cheating, Duplication, Test Length

Estimation of Mixture Rasch Models from Skewed Latent Ability Distributions

Peer reviewed

Direct link

Karadavut, Tugba; Cohen, Allan S.; Kim, Seock-Ho – Measurement: Interdisciplinary Research and Perspectives, 2020

Mixture Rasch (MixRasch) models conventionally assume normal distributions for latent ability. Previous research has shown that the assumption of normality is often unmet in educational and psychological measurement. When normality is assumed, asymmetry in the actual latent ability distribution has been shown to result in extraction of spurious…

Descriptors: Item Response Theory, Ability, Statistical Distributions, Sample Size

Determination of Type I Error Rates and Power of Answer Copying Indices under Various Conditions

Peer reviewed
PDF on ERIC

Download full text

Yormaz, Seha; Sünbül, Önder – Educational Sciences: Theory and Practice, 2017

This study aims to determine the Type I error rates and power of S[subscript 1] , S[subscript 2] indices and kappa statistic at detecting copying on multiple-choice tests under various conditions. It also aims to determine how copying groups are created in order to calculate how kappa statistics affect Type I error rates and power. In this study,…

Descriptors: Statistical Analysis, Cheating, Multiple Choice Tests, Sample Size

Evaluating the Accuracy of the Empirical Item Characteristic Curve Preequating Method in the Presence of Test Speededness

Peer reviewed

Direct link

Qiu, Yuxi; Huggins-Manley, Anne Corinne – Educational and Psychological Measurement, 2019

This study aimed to assess the accuracy of the empirical item characteristic curve (EICC) preequating method given the presence of test speededness. The simulation design of this study considered the proportion of speededness, speededness point, speededness rate, proportion of missing on speeded items, sample size, and test length. After crossing…

Descriptors: Accuracy, Equated Scores, Test Items, Nonparametric Statistics

A Shorter Short Version of Barron's Ego Strength Scale

Peer reviewed

Direct link

Kelly, William E.; Daughtry, Don – College Student Journal, 2018

This study developed an abbreviated form of Barron's (1953) Ego Strength Scale for use in research among college student samples. A version of Barron's scale was administered to 100 undergraduate college students. Using item-total score correlations and internal consistency, the scale was reduced to 18 items (Es18). The Es18 possessed adequate…

Descriptors: Undergraduate Students, Self Concept Measures, Test Length, Scores

Multidimensional Extension of Multiple Indicators Multiple Causes Models to Detect DIF

Peer reviewed

Direct link

Lee, Soo; Bulut, Okan; Suh, Youngsuk – Educational and Psychological Measurement, 2017

A number of studies have found multiple indicators multiple causes (MIMIC) models to be an effective tool in detecting uniform differential item functioning (DIF) for individual items and item bundles. A recently developed MIMIC-interaction model is capable of detecting both uniform and nonuniform DIF in the unidimensional item response theory…

Descriptors: Test Bias, Test Items, Models, Item Response Theory

Profile Analyses as Feedback by Evaluating the Balance in Exam Scores

Peer reviewed
PDF on ERIC

Download full text

Vaheoja, Monika; Verhelst, N. D.; Eggen, T.J.H.M. – European Journal of Science and Mathematics Education, 2019

In this article, the authors applied profile analysis to Maths exam data to demonstrate how different exam forms, differing in difficulty and length, can be reported and easily interpreted. The results were presented for different groups of participants and for different institutions in different Maths domains by evaluating the balance. Some…

Descriptors: Feedback (Response), Foreign Countries, Statistical Analysis, Scores

ANOVA Analysis of Student Daily Test Scores in Multi-Day Test Periods

Peer reviewed
PDF on ERIC

Download full text

Mouritsen, Matthew L.; Davis, Jefferson T.; Jones, Steven C. – Journal of Learning in Higher Education, 2016

Instructors are often concerned when giving multiple-day tests because students taking the test later in the exam period may have an advantage over students taking the test early in the exam period due to information leakage. However, exam scores seemed to decline as students took the same test later in a multi-day exam period (Mouritsen and…

Descriptors: Statistical Analysis, Scores, Tests, Testing

Previous Page | Next Page »

Pages: 1 | 2 | 3 | 4 | 5

Educational and Psychological…	17
ETS Research Report Series	6
Applied Measurement in…	3
Applied Psychological…	3
Journal of Educational…	3
ProQuest LLC	3
Educational Sciences: Theory…	2
Measurement:…	2
ACT, Inc.	1
College Entrance Examination…	1
College Student Journal	1
Eurasian Journal of…	1
European Journal of Science…	1
International Journal of…	1
International Journal of…	1
Journal of Educational…	1
Journal of Experimental…	1
Journal of Learning in Higher…	1
Journal of Speech, Language,…	1
Learning Disabilities: A…	1
Perceptual and Motor Skills	1
Psychometrika	1
School Psychology Quarterly	1
Toegepaste taalwetenschap in…	1
More ▼

Bulut, Okan	2
Cohen, Allan S.	2
Huggins-Manley, Anne Corinne	2
Paek, Insu	2
Svetina, Dubravka	2
Tay, Louis	2
Wang, Wen-Chung	2
Weiss, David J.	2
Yormaz, Seha	2
de Jong, John H. A. L.	2
Abad, Francisco J.	1
Allspach, Jill R.	1
Anthony, Christopher James	1
Arsan, Nihan	1
Atalay Kabasakal, Kübra	1
Baba, Kyoko	1
Bauer, Ernest A.	1
Brown, Joel M.	1
Budescu, David	1
Burton, Nancy	1
Cao, Mengyang	1
Chen, Cheng-Te	1
Chen, Troy T.	1
Christie, Robert E.	1
More ▼