ERIC - Search Results

Publication Date

In 2026	0
Since 2025	0
Since 2022 (last 5 years)	2
Since 2017 (last 10 years)	4
Since 2007 (last 20 years)	11

Descriptor

Evaluation Methods	13
Test Items	13
Test Length	13
Simulation	8
Item Response Theory	7
Item Analysis	5
Correlation	4
Computer Assisted Testing	3
Error Patterns	3
Evaluation Problems	3
Evaluation Research	3
Monte Carlo Methods	3
Sample Size	3
Test Construction	3
Adaptive Testing	2
Cutting Scores	2
Error of Measurement	2
Foreign Countries	2
Goodness of Fit	2
Mathematics Achievement	2
Statistical Analysis	2
Test Bias	2
Test Format	2
Testing Problems	2
Age Differences	1
More ▼

Source

Educational and Psychological…	3
ETS Research Report Series	2
Applied Measurement in…	1
Applied Psychological…	1
Educational Research and…	1
International Journal of…	1
Journal of Educational…	1
Measurement:…	1
OECD Publishing (NJ1)	1
Office of Education, United…	1

Publication Type

Journal Articles	11
Reports - Research	9
Reports - Evaluative	3
Historical Materials	1
Numerical/Quantitative Data	1
Reports - Descriptive	1

Education Level

Elementary Secondary Education	3
High Schools	1
Secondary Education	1

Audience

Administrators

Location

Asia	1
Taiwan	1

Laws, Policies, & Programs

Assessments and Surveys

Program for International…	1
Trends in International…	1

What Works Clearinghouse Rating

Showing all 13 results Save | Export

Assessing Dimensionality of IRT Models Using Traditional and Revised Parallel Analyses

Peer reviewed

Direct link

Guo, Wenjing; Choi, Youn-Jeng – Educational and Psychological Measurement, 2023

Determining the number of dimensions is extremely important in applying item response theory (IRT) models to data. Traditional and revised parallel analyses have been proposed within the factor analysis framework, and both have shown some promise in assessing dimensionality. However, their performance in the IRT framework has not been…

Descriptors: Item Response Theory, Evaluation Methods, Factor Analysis, Guidelines

There Are Many Greater Lower Bounds than Cronbach's [alpha]: A Monte Carlo Simulation Study

Peer reviewed

Direct link

Novak, Josip; Rebernjak, Blaž – Measurement: Interdisciplinary Research and Perspectives, 2023

A Monte Carlo simulation study was conducted to examine the performance of [alpha], [lambda]2, [lambda][subscript 4], [lambda][subscript 2], [omega][subscript T], GLB[subscript MRFA], and GLB[subscript Algebraic] coefficients. Population reliability, distribution shape, sample size, test length, and number of response categories were varied…

Descriptors: Monte Carlo Methods, Evaluation Methods, Reliability, Simulation

A Modified "a"-Stratified Method for Computerized Adaptive Testing. Research Report. ETS RR-19-10

Peer reviewed
PDF on ERIC

Download full text

Gu, Lixiong; Ling, Guangming; Qu, Yanxuan – ETS Research Report Series, 2019

Research has found that the "a"-stratified item selection strategy (STR) for computerized adaptive tests (CATs) may lead to insufficient use of high a items at later stages of the tests and thus to reduced measurement precision. A refined approach, unequal item selection across strata (USTR), effectively improves test precision over the…

Descriptors: Computer Assisted Testing, Adaptive Testing, Test Use, Test Items

Variability in Percentage above Cut Scores Due to Discreteness in Score Scale. Research Report. ETS RR-17-32

Peer reviewed
PDF on ERIC

Download full text

Lu, Ying – ETS Research Report Series, 2017

For standard- or criterion-based assessments, the use of cut scores to indicate mastery, nonmastery, or different levels of skill mastery is very common. As part of performance summary, it is of interest to examine the percentage of examinees at or above the cut scores (PAC) and how PAC evolves across administrations. This paper shows that…

Descriptors: Cutting Scores, Evaluation Methods, Mastery Learning, Performance Based Assessment

An Investigation of Sample Size Splitting on ATFIND and DIMTEST

Peer reviewed

Direct link

Socha, Alan; DeMars, Christine E. – Educational and Psychological Measurement, 2013

Modeling multidimensional test data with a unidimensional model can result in serious statistical errors, such as bias in item parameter estimates. Many methods exist for assessing the dimensionality of a test. The current study focused on DIMTEST. Using simulated data, the effects of sample size splitting for use with the ATFIND procedure for…

Descriptors: Sample Size, Test Length, Correlation, Test Format

Polytomous Adaptive Classification Testing: Effects of Item Pool Size, Test Termination Criterion, and Number of Cutscores

Peer reviewed

Direct link

Gnambs, Timo; Batinic, Bernad – Educational and Psychological Measurement, 2011

Computer-adaptive classification tests focus on classifying respondents in different proficiency groups (e.g., for pass/fail decisions). To date, adaptive classification testing has been dominated by research on dichotomous response formats and classifications in two groups. This article extends this line of research to polytomous classification…

Descriptors: Test Length, Computer Assisted Testing, Classification, Test Items

Computerized Adaptive Testing with the Zinnes and Griggs Pairwise Preference Ideal Point Model

Peer reviewed

Direct link

Stark, Stephen; Chernyshenko, Oleksandr S. – International Journal of Testing, 2011

This article delves into a relatively unexplored area of measurement by focusing on adaptive testing with unidimensional pairwise preference items. The use of such tests is becoming more common in applied non-cognitive assessment because research suggests that this format may help to reduce certain types of rater error and response sets commonly…

Descriptors: Test Length, Simulation, Adaptive Testing, Item Analysis

Ongoing Issues in Test Fairness

Peer reviewed

Direct link

Camilli, Gregory – Educational Research and Evaluation, 2013

In the attempt to identify or prevent unfair tests, both quantitative analyses and logical evaluation are often used. For the most part, fairness evaluation is a pragmatic attempt at determining whether procedural or substantive due process has been accorded to either a group of test takers or an individual. In both the individual and comparative…

Descriptors: Alternative Assessment, Test Bias, Test Content, Test Format

Comparing the Similarities and Differences of PISA 2003 and TIMSS. OECD Education Working Papers, No. 32

Direct link

Wu, Margaret – OECD Publishing (NJ1), 2010

This paper makes an in-depth comparison of the PISA (OECD) and TIMSS (IEA) mathematics assessments conducted in 2003. First, a comparison of survey methodologies is presented, followed by an examination of the mathematics frameworks in the two studies. The methodologies and the frameworks in the two studies form the basis for providing…

Descriptors: Mathematics Achievement, Foreign Countries, Gender Differences, Comparative Analysis

Investigation of a Nonparametric Procedure for Assessing Goodness-of-Fit in Item Response Theory

Peer reviewed

Direct link

Wells, Craig S.; Bolt, Daniel M. – Applied Measurement in Education, 2008

Tests of model misfit are often performed to validate the use of a particular model in item response theory. Douglas and Cohen (2001) introduced a general nonparametric approach for detecting misfit under the two-parameter logistic model. However, the statistical properties of their approach, and empirical comparisons to other methods, have not…

Descriptors: Test Length, Test Items, Monte Carlo Methods, Nonparametric Statistics

The Hierarchy Consistency Index: Evaluating Person Fit for Cognitive Diagnostic Assessment

Peer reviewed

Direct link

Cui, Ying; Leighton, Jacqueline P. – Journal of Educational Measurement, 2009

In this article, we introduce a person-fit statistic called the hierarchy consistency index (HCI) to help detect misfitting item response vectors for tests developed and analyzed based on a cognitive model. The HCI ranges from -1.0 to 1.0, with values close to -1.0 indicating that students respond unexpectedly or differently from the responses…

Descriptors: Test Length, Simulation, Correlation, Research Methodology

Factors Influencing the Mantel and Generalized Mantel-Haenszel Methods for the Assessment of Differential Item Functioning in Polytomous Items

Peer reviewed

Direct link

Wang, Wen-Chung; Su, Ya-Hui – Applied Psychological Measurement, 2004

Eight independent variables (differential item functioning [DIF] detection method, purification procedure, item response model, mean latent trait difference between groups, test length, DIF pattern, magnitude of DIF, and percentage of DIF items) were manipulated, and two dependent variables (Type I error and power) were assessed through…

Descriptors: Test Length, Test Bias, Simulation, Item Response Theory

Testing Practices of High-School Teachers. Bulletin, 1936, No. 9

Download full text

Lee, J. Murray; Segel, David – Office of Education, United States Department of the Interior, 1936

In order to make an intelligent advance in any school practice a knowledge of what schools are doing in that practice is almost indispensable, since a transition in procedures must be a growth from the one to the other. This bulletin gives this background of facts concerning the use of tests and examinations by the different subject departments in…

Descriptors: Testing, Teachers, Standardized Tests, Principals

Batinic, Bernad	1
Bolt, Daniel M.	1
Camilli, Gregory	1
Chernyshenko, Oleksandr S.	1
Choi, Youn-Jeng	1
Cui, Ying	1
DeMars, Christine E.	1
Gnambs, Timo	1
Gu, Lixiong	1
Guo, Wenjing	1
Lee, J. Murray	1
Leighton, Jacqueline P.	1
Ling, Guangming	1
Lu, Ying	1
Novak, Josip	1
Qu, Yanxuan	1
Rebernjak, Blaž	1
Segel, David	1
Socha, Alan	1
Stark, Stephen	1
Su, Ya-Hui	1
Wang, Wen-Chung	1
Wells, Craig S.	1
Wu, Margaret	1
More ▼