Publication Date
In 2025 | 0 |
Since 2024 | 0 |
Since 2021 (last 5 years) | 2 |
Since 2016 (last 10 years) | 4 |
Since 2006 (last 20 years) | 11 |
Descriptor
Evaluation Methods | 13 |
Test Items | 13 |
Test Length | 13 |
Simulation | 8 |
Item Response Theory | 7 |
Item Analysis | 5 |
Correlation | 4 |
Computer Assisted Testing | 3 |
Error Patterns | 3 |
Evaluation Problems | 3 |
Evaluation Research | 3 |
More ▼ |
Source
Author
Batinic, Bernad | 1 |
Bolt, Daniel M. | 1 |
Camilli, Gregory | 1 |
Chernyshenko, Oleksandr S. | 1 |
Choi, Youn-Jeng | 1 |
Cui, Ying | 1 |
DeMars, Christine E. | 1 |
Gnambs, Timo | 1 |
Gu, Lixiong | 1 |
Guo, Wenjing | 1 |
Lee, J. Murray | 1 |
More ▼ |
Publication Type
Journal Articles | 11 |
Reports - Research | 9 |
Reports - Evaluative | 3 |
Historical Materials | 1 |
Numerical/Quantitative Data | 1 |
Reports - Descriptive | 1 |
Education Level
Elementary Secondary Education | 3 |
High Schools | 1 |
Secondary Education | 1 |
Audience
Administrators | 1 |
Laws, Policies, & Programs
Assessments and Surveys
Program for International… | 1 |
Trends in International… | 1 |
What Works Clearinghouse Rating
Guo, Wenjing; Choi, Youn-Jeng – Educational and Psychological Measurement, 2023
Determining the number of dimensions is extremely important in applying item response theory (IRT) models to data. Traditional and revised parallel analyses have been proposed within the factor analysis framework, and both have shown some promise in assessing dimensionality. However, their performance in the IRT framework has not been…
Descriptors: Item Response Theory, Evaluation Methods, Factor Analysis, Guidelines
Novak, Josip; Rebernjak, Blaž – Measurement: Interdisciplinary Research and Perspectives, 2023
A Monte Carlo simulation study was conducted to examine the performance of [alpha], [lambda]2, [lambda][subscript 4], [lambda][subscript 2], [omega][subscript T], GLB[subscript MRFA], and GLB[subscript Algebraic] coefficients. Population reliability, distribution shape, sample size, test length, and number of response categories were varied…
Descriptors: Monte Carlo Methods, Evaluation Methods, Reliability, Simulation
Gu, Lixiong; Ling, Guangming; Qu, Yanxuan – ETS Research Report Series, 2019
Research has found that the "a"-stratified item selection strategy (STR) for computerized adaptive tests (CATs) may lead to insufficient use of high a items at later stages of the tests and thus to reduced measurement precision. A refined approach, unequal item selection across strata (USTR), effectively improves test precision over the…
Descriptors: Computer Assisted Testing, Adaptive Testing, Test Use, Test Items
Lu, Ying – ETS Research Report Series, 2017
For standard- or criterion-based assessments, the use of cut scores to indicate mastery, nonmastery, or different levels of skill mastery is very common. As part of performance summary, it is of interest to examine the percentage of examinees at or above the cut scores (PAC) and how PAC evolves across administrations. This paper shows that…
Descriptors: Cutting Scores, Evaluation Methods, Mastery Learning, Performance Based Assessment
Socha, Alan; DeMars, Christine E. – Educational and Psychological Measurement, 2013
Modeling multidimensional test data with a unidimensional model can result in serious statistical errors, such as bias in item parameter estimates. Many methods exist for assessing the dimensionality of a test. The current study focused on DIMTEST. Using simulated data, the effects of sample size splitting for use with the ATFIND procedure for…
Descriptors: Sample Size, Test Length, Correlation, Test Format
Gnambs, Timo; Batinic, Bernad – Educational and Psychological Measurement, 2011
Computer-adaptive classification tests focus on classifying respondents in different proficiency groups (e.g., for pass/fail decisions). To date, adaptive classification testing has been dominated by research on dichotomous response formats and classifications in two groups. This article extends this line of research to polytomous classification…
Descriptors: Test Length, Computer Assisted Testing, Classification, Test Items
Stark, Stephen; Chernyshenko, Oleksandr S. – International Journal of Testing, 2011
This article delves into a relatively unexplored area of measurement by focusing on adaptive testing with unidimensional pairwise preference items. The use of such tests is becoming more common in applied non-cognitive assessment because research suggests that this format may help to reduce certain types of rater error and response sets commonly…
Descriptors: Test Length, Simulation, Adaptive Testing, Item Analysis
Camilli, Gregory – Educational Research and Evaluation, 2013
In the attempt to identify or prevent unfair tests, both quantitative analyses and logical evaluation are often used. For the most part, fairness evaluation is a pragmatic attempt at determining whether procedural or substantive due process has been accorded to either a group of test takers or an individual. In both the individual and comparative…
Descriptors: Alternative Assessment, Test Bias, Test Content, Test Format
Wu, Margaret – OECD Publishing (NJ1), 2010
This paper makes an in-depth comparison of the PISA (OECD) and TIMSS (IEA) mathematics assessments conducted in 2003. First, a comparison of survey methodologies is presented, followed by an examination of the mathematics frameworks in the two studies. The methodologies and the frameworks in the two studies form the basis for providing…
Descriptors: Mathematics Achievement, Foreign Countries, Gender Differences, Comparative Analysis
Wells, Craig S.; Bolt, Daniel M. – Applied Measurement in Education, 2008
Tests of model misfit are often performed to validate the use of a particular model in item response theory. Douglas and Cohen (2001) introduced a general nonparametric approach for detecting misfit under the two-parameter logistic model. However, the statistical properties of their approach, and empirical comparisons to other methods, have not…
Descriptors: Test Length, Test Items, Monte Carlo Methods, Nonparametric Statistics
Cui, Ying; Leighton, Jacqueline P. – Journal of Educational Measurement, 2009
In this article, we introduce a person-fit statistic called the hierarchy consistency index (HCI) to help detect misfitting item response vectors for tests developed and analyzed based on a cognitive model. The HCI ranges from -1.0 to 1.0, with values close to -1.0 indicating that students respond unexpectedly or differently from the responses…
Descriptors: Test Length, Simulation, Correlation, Research Methodology
Wang, Wen-Chung; Su, Ya-Hui – Applied Psychological Measurement, 2004
Eight independent variables (differential item functioning [DIF] detection method, purification procedure, item response model, mean latent trait difference between groups, test length, DIF pattern, magnitude of DIF, and percentage of DIF items) were manipulated, and two dependent variables (Type I error and power) were assessed through…
Descriptors: Test Length, Test Bias, Simulation, Item Response Theory
Lee, J. Murray; Segel, David – Office of Education, United States Department of the Interior, 1936
In order to make an intelligent advance in any school practice a knowledge of what schools are doing in that practice is almost indispensable, since a transition in procedures must be a growth from the one to the other. This bulletin gives this background of facts concerning the use of tests and examinations by the different subject departments in…
Descriptors: Testing, Teachers, Standardized Tests, Principals