Publication Date
In 2025 | 0 |
Since 2024 | 0 |
Since 2021 (last 5 years) | 0 |
Since 2016 (last 10 years) | 1 |
Since 2006 (last 20 years) | 4 |
Descriptor
Models | 6 |
Scores | 6 |
Test Length | 6 |
Item Response Theory | 4 |
Error of Measurement | 3 |
Goodness of Fit | 3 |
Simulation | 3 |
Comparative Analysis | 2 |
Sample Size | 2 |
Statistical Analysis | 2 |
Test Format | 2 |
More ▼ |
Author
Burton, Richard F. | 1 |
Campbell, Todd | 1 |
Chen, Troy T. | 1 |
Chon, Kyong Hee | 1 |
Dunbar, Stephen B. | 1 |
Feng, Yuling | 1 |
Fu, Jianbin | 1 |
Kang, Taehoon | 1 |
Lee, Won-Chan | 1 |
Patsula, Liane | 1 |
Rizavi, Saba | 1 |
More ▼ |
Publication Type
Reports - Research | 5 |
Journal Articles | 4 |
Reports - Evaluative | 1 |
Speeches/Meeting Papers | 1 |
Education Level
Audience
Location
Laws, Policies, & Programs
Assessments and Surveys
Bem Sex Role Inventory | 1 |
What Works Clearinghouse Rating
Fu, Jianbin; Feng, Yuling – ETS Research Report Series, 2018
In this study, we propose aggregating test scores with unidimensional within-test structure and multidimensional across-test structure based on a 2-level, 1-factor model. In particular, we compare 6 score aggregation methods: average of standardized test raw scores (M1), regression factor score estimate of the 1-factor model based on the…
Descriptors: Comparative Analysis, Scores, Correlation, Standardized Tests
Chon, Kyong Hee; Lee, Won-Chan; Dunbar, Stephen B. – Journal of Educational Measurement, 2010
In this study we examined procedures for assessing model-data fit of item response theory (IRT) models for mixed format data. The model fit indices used in this study include PARSCALE's G[superscript 2], Orlando and Thissen's S-X[superscript 2] and S-G[superscript 2], and Stone's chi[superscript 2*] and G[superscript 2*]. To investigate the…
Descriptors: Test Length, Goodness of Fit, Item Response Theory, Simulation
Kang, Taehoon; Chen, Troy T. – ACT, Inc., 2007
Orlando and Thissen (2000, 2003) proposed an item-fit index, S-X[superscript 2], for dichotomous item response theory (IRT) models, which has performed better than traditional item-fit statistics such as Yen's (1981) Q[subscript 1] and McKinley and Mill's (1985) G[superscript 2]. This study extends the utility of S-X[superscript 2] to polytomous…
Descriptors: Item Response Theory, Models, Computer Software, Statistical Analysis
Rotou, Ourania; Patsula, Liane; Steffen, Manfred; Rizavi, Saba – ETS Research Report Series, 2007
Traditionally, the fixed-length linear paper-and-pencil (P&P) mode of administration has been the standard method of test delivery. With the advancement of technology, however, the popularity of administering tests using adaptive methods like computerized adaptive testing (CAT) and multistage testing (MST) has grown in the field of measurement…
Descriptors: Comparative Analysis, Test Format, Computer Assisted Testing, Models
Campbell, Todd; And Others – 1995
In the early 1970s A. Constantinople wrote a seminal article that led to the development of the construct of psychological androgyny. The Bem Sex-Role Inventory is a popular measure of the construct, but the measure remains controversial. The construct validity of scores from the measure was explored using confirmatory factor analysis on data from…
Descriptors: Androgyny, College Students, Construct Validity, Factor Structure
Multiple Choice and True/False Tests: Reliability Measures and Some Implications of Negative Marking
Burton, Richard F. – Assessment & Evaluation in Higher Education, 2004
The standard error of measurement usefully provides confidence limits for scores in a given test, but is it possible to quantify the reliability of a test with just a single number that allows comparison of tests of different format? Reliability coefficients do not do this, being dependent on the spread of examinee attainment. Better in this…
Descriptors: Multiple Choice Tests, Error of Measurement, Test Reliability, Test Items