ERIC - Search Results

Publication Date

In 2026	0
Since 2025	0
Since 2022 (last 5 years)	5
Since 2017 (last 10 years)	7
Since 2007 (last 20 years)	24

Descriptor

Evaluation Methods	33
Test Length	33
Item Response Theory	17
Simulation	15
Test Items	13
Sample Size	8
Test Construction	7
Computation	6
Computer Assisted Testing	6
Correlation	6
Item Analysis	6
Foreign Countries	5
Monte Carlo Methods	5
Psychometrics	5
Scores	5
Adaptive Testing	4
Classification	4
Comparative Analysis	4
Cutting Scores	4
Error Patterns	4
Error of Measurement	4
Evaluation Research	4
Goodness of Fit	4
Research Methodology	4
Student Evaluation	4
More ▼

Publication Type

Journal Articles	23
Reports - Research	17
Reports - Evaluative	6
Reports - Descriptive	5
Dissertations/Theses -…	2
Guides - Non-Classroom	2
Numerical/Quantitative Data	2
Opinion Papers	2
Speeches/Meeting Papers	2
Historical Materials	1
Reports - General	1
More ▼

Education Level

Elementary Secondary Education	4
Higher Education	3
Elementary Education	2
High Schools	1
Middle Schools	1
Postsecondary Education	1
Secondary Education	1

Audience

Administrators	1
Researchers	1

Location

Taiwan	2
Asia	1
Australia	1
Florida	1
Netherlands	1
Pennsylvania	1

Laws, Policies, & Programs

Assessments and Surveys

Advanced Placement…	1
Florida Comprehensive…	1
Preliminary Scholastic…	1
Program for International…	1
SAT (College Admission Test)	1
Test of English as a Foreign…	1
Trends in International…	1

What Works Clearinghouse Rating

Showing 1 to 15 of 33 results Save | Export

An Exponentially Weighted Moving Average Procedure for Detecting Back Random Responding Behavior

Peer reviewed

Direct link

He, Yinhong – Journal of Educational Measurement, 2023

Back random responding (BRR) behavior is one of the commonly observed careless response behaviors. Accurately detecting BRR behavior can improve test validities. Yu and Cheng (2019) showed that the change point analysis (CPA) procedure based on weighted residual (CPA-WR) performed well in detecting BRR. Compared with the CPA procedure, the…

Descriptors: Test Validity, Item Response Theory, Measurement, Monte Carlo Methods

Assessing Dimensionality of IRT Models Using Traditional and Revised Parallel Analyses

Peer reviewed

Direct link

Guo, Wenjing; Choi, Youn-Jeng – Educational and Psychological Measurement, 2023

Determining the number of dimensions is extremely important in applying item response theory (IRT) models to data. Traditional and revised parallel analyses have been proposed within the factor analysis framework, and both have shown some promise in assessing dimensionality. However, their performance in the IRT framework has not been…

Descriptors: Item Response Theory, Evaluation Methods, Factor Analysis, Guidelines

There Are Many Greater Lower Bounds than Cronbach's [alpha]: A Monte Carlo Simulation Study

Peer reviewed

Direct link

Novak, Josip; Rebernjak, Blaž – Measurement: Interdisciplinary Research and Perspectives, 2023

A Monte Carlo simulation study was conducted to examine the performance of [alpha], [lambda]2, [lambda][subscript 4], [lambda][subscript 2], [omega][subscript T], GLB[subscript MRFA], and GLB[subscript Algebraic] coefficients. Population reliability, distribution shape, sample size, test length, and number of response categories were varied…

Descriptors: Monte Carlo Methods, Evaluation Methods, Reliability, Simulation

A Note on Improving Variational Estimation for Multidimensional Item Response Theory

Peer reviewed

Direct link

Chenchen Ma; Jing Ouyang; Chun Wang; Gongjun Xu – Grantee Submission, 2024

Survey instruments and assessments are frequently used in many domains of social science. When the constructs that these assessments try to measure become multifaceted, multidimensional item response theory (MIRT) provides a unified framework and convenient statistical tool for item analysis, calibration, and scoring. However, the computational…

Descriptors: Algorithms, Item Response Theory, Scoring, Accuracy

Modified Item-Fit Indices for Dichotomous IRT Models with Missing Data

Peer reviewed
PDF on ERIC

Download full text

Direct link

Xue Zhang; Chun Wang – Grantee Submission, 2022

Item-level fit analysis not only serves as a complementary check to global fit analysis, it is also essential in scale development because the fit results will guide item revision and/or deletion (Liu & Maydeu-Olivares, 2014). During data collection, missing response data may likely happen due to various reasons. Chi-square-based item fit…

Descriptors: Goodness of Fit, Item Response Theory, Scores, Test Length

A Modified "a"-Stratified Method for Computerized Adaptive Testing. Research Report. ETS RR-19-10

Peer reviewed
PDF on ERIC

Download full text

Gu, Lixiong; Ling, Guangming; Qu, Yanxuan – ETS Research Report Series, 2019

Research has found that the "a"-stratified item selection strategy (STR) for computerized adaptive tests (CATs) may lead to insufficient use of high a items at later stages of the tests and thus to reduced measurement precision. A refined approach, unequal item selection across strata (USTR), effectively improves test precision over the…

Descriptors: Computer Assisted Testing, Adaptive Testing, Test Use, Test Items

Variability in Percentage above Cut Scores Due to Discreteness in Score Scale. Research Report. ETS RR-17-32

Peer reviewed
PDF on ERIC

Download full text

Lu, Ying – ETS Research Report Series, 2017

For standard- or criterion-based assessments, the use of cut scores to indicate mastery, nonmastery, or different levels of skill mastery is very common. As part of performance summary, it is of interest to examine the percentage of examinees at or above the cut scores (PAC) and how PAC evolves across administrations. This paper shows that…

Descriptors: Cutting Scores, Evaluation Methods, Mastery Learning, Performance Based Assessment

A Nonparametric Approach to Estimate Classification Accuracy and Consistency

Peer reviewed

Direct link

Lathrop, Quinn N.; Cheng, Ying – Journal of Educational Measurement, 2014

When cut scores for classifications occur on the total score scale, popular methods for estimating classification accuracy (CA) and classification consistency (CC) require assumptions about a parametric form of the test scores or about a parametric response model, such as item response theory (IRT). This article develops an approach to estimate CA…

Descriptors: Cutting Scores, Classification, Computation, Nonparametric Statistics

An Investigation of Sample Size Splitting on ATFIND and DIMTEST

Peer reviewed

Direct link

Socha, Alan; DeMars, Christine E. – Educational and Psychological Measurement, 2013

Modeling multidimensional test data with a unidimensional model can result in serious statistical errors, such as bias in item parameter estimates. Many methods exist for assessing the dimensionality of a test. The current study focused on DIMTEST. Using simulated data, the effects of sample size splitting for use with the ATFIND procedure for…

Descriptors: Sample Size, Test Length, Correlation, Test Format

Adjusting the Adjusted X[superscript 2]/df Ratio Statistic for Dichotomous Item Response Theory Analyses: Does the Model Fit?

Peer reviewed

Direct link

Tay, Louis; Drasgow, Fritz – Educational and Psychological Measurement, 2012

Two Monte Carlo simulation studies investigated the effectiveness of the mean adjusted X[superscript 2]/df statistic proposed by Drasgow and colleagues and, because of problems with the method, a new approach for assessing the goodness of fit of an item response theory model was developed. It has been previously recommended that mean adjusted…

Descriptors: Test Length, Monte Carlo Methods, Goodness of Fit, Item Response Theory

Future of Psychometrics: Ask What Psychometrics Can Do for Psychology

Peer reviewed

Direct link

Sijtsma, Klaas – Psychometrika, 2012

I address two issues that were inspired by my work on the Dutch Committee on Tests and Testing (COTAN). The first issue is the understanding of problems test constructors and researchers using tests have of psychometric knowledge. I argue that this understanding is important for a field, like psychometrics, for which the dissemination of…

Descriptors: Foreign Countries, Psychometrics, Knowledge Level, Test Construction

Polytomous Adaptive Classification Testing: Effects of Item Pool Size, Test Termination Criterion, and Number of Cutscores

Peer reviewed

Direct link

Gnambs, Timo; Batinic, Bernad – Educational and Psychological Measurement, 2011

Computer-adaptive classification tests focus on classifying respondents in different proficiency groups (e.g., for pass/fail decisions). To date, adaptive classification testing has been dominated by research on dichotomous response formats and classifications in two groups. This article extends this line of research to polytomous classification…

Descriptors: Test Length, Computer Assisted Testing, Classification, Test Items

Computerized Adaptive Testing with the Zinnes and Griggs Pairwise Preference Ideal Point Model

Peer reviewed

Direct link

Stark, Stephen; Chernyshenko, Oleksandr S. – International Journal of Testing, 2011

This article delves into a relatively unexplored area of measurement by focusing on adaptive testing with unidimensional pairwise preference items. The use of such tests is becoming more common in applied non-cognitive assessment because research suggests that this format may help to reduce certain types of rater error and response sets commonly…

Descriptors: Test Length, Simulation, Adaptive Testing, Item Analysis

Improving IRT Parameter Estimates with Small Sample Sizes: Evaluating the Efficacy of a New Data Augmentation Technique

Direct link

Foley, Brett Patrick – ProQuest LLC, 2010

The 3PL model is a flexible and widely used tool in assessment. However, it suffers from limitations due to its need for large sample sizes. This study introduces and evaluates the efficacy of a new sample size augmentation technique called Duplicate, Erase, and Replace (DupER) Augmentation through a simulation study. Data are augmented using…

Descriptors: Test Length, Sample Size, Simulation, Item Response Theory

Ongoing Issues in Test Fairness

Peer reviewed

Direct link

Camilli, Gregory – Educational Research and Evaluation, 2013

In the attempt to identify or prevent unfair tests, both quantitative analyses and logical evaluation are often used. For the most part, fairness evaluation is a pragmatic attempt at determining whether procedural or substantive due process has been accorded to either a group of test takers or an individual. In both the individual and comparative…

Descriptors: Alternative Assessment, Test Bias, Test Content, Test Format

Previous Page | Next Page »

Pages: 1 | 2 | 3

Educational and Psychological…	6
Applied Psychological…	3
Journal of Educational…	3
ETS Research Report Series	2
Grantee Submission	2
ProQuest LLC	2
Applied Measurement in…	1
Australian Association for…	1
College Board	1
Contemporary Education	1
ERS Spectrum	1
Educational Research and…	1
International Journal of…	1
Measurement:…	1
New Directions for Program…	1
OECD Publishing (NJ1)	1
Office of Education, United…	1
Pennsylvania Department of…	1
Psychometrika	1
More ▼

Wang, Wen-Chung	3
Chun Wang	2
Woods, Carol M.	2
Batinic, Bernad	1
Bejar, Isaac I.	1
Bentley-Williams, Robyn	1
Bolt, Daniel M.	1
Bradburn, Norman	1
Camilli, Gregory	1
Chen, Hsueh-Chu	1
Chenchen Ma	1
Cheng, Ying	1
Chernyshenko, Oleksandr S.	1
Choi, Youn-Jeng	1
Cui, Ying	1
DeMars, Christine E.	1
Drasgow, Fritz	1
Egley, Robert J.	1
Foley, Brett Patrick	1
Forbes, Anne	1
Gnambs, Timo	1
Gongjun Xu	1
Gu, Lixiong	1
Guo, Wenjing	1
More ▼