Publication Date
In 2025 | 0 |
Since 2024 | 0 |
Since 2021 (last 5 years) | 2 |
Since 2016 (last 10 years) | 5 |
Since 2006 (last 20 years) | 8 |
Descriptor
Error of Measurement | 19 |
Test Length | 19 |
Test Reliability | 19 |
Test Items | 8 |
Item Response Theory | 6 |
Scores | 6 |
Computer Assisted Testing | 5 |
Sample Size | 5 |
Test Format | 5 |
Models | 4 |
Simulation | 4 |
More ▼ |
Source
Author
Yao, Lihua | 2 |
Allison, Paul A. | 1 |
Axelrod, Bradley N. | 1 |
Battista, Carmela | 1 |
Burton, Richard F. | 1 |
Campana, Kayla V. | 1 |
Chen, Hsueh-Chu | 1 |
Cureton, Edward E. | 1 |
Dawes, Jillian M. | 1 |
Ellis, Jules L. | 1 |
Feldt, Leonard S. | 1 |
More ▼ |
Publication Type
Reports - Research | 12 |
Journal Articles | 11 |
Reports - Evaluative | 3 |
Speeches/Meeting Papers | 2 |
Reports - Descriptive | 1 |
Education Level
Early Childhood Education | 1 |
Elementary Education | 1 |
Grade 3 | 1 |
Primary Education | 1 |
Audience
Location
Taiwan | 1 |
Laws, Policies, & Programs
Assessments and Surveys
Armed Forces Qualification… | 1 |
Comprehensive Tests of Basic… | 1 |
Test of English as a Foreign… | 1 |
Wechsler Adult Intelligence… | 1 |
What Works Clearinghouse Rating
Ellis, Jules L. – Educational and Psychological Measurement, 2021
This study develops a theoretical model for the costs of an exam as a function of its duration. Two kind of costs are distinguished: (1) the costs of measurement errors and (2) the costs of the measurement. Both costs are expressed in time of the student. Based on a classical test theory model, enriched with assumptions on the context, the costs…
Descriptors: Test Length, Models, Error of Measurement, Measurement
Ozdemir, Burhanettin; Gelbal, Selahattin – Education and Information Technologies, 2022
The computerized adaptive tests (CAT) apply an adaptive process in which the items are tailored to individuals' ability scores. The multidimensional CAT (MCAT) designs differ in terms of different item selection, ability estimation, and termination methods being used. This study aims at investigating the performance of the MCAT designs used to…
Descriptors: Scores, Computer Assisted Testing, Test Items, Language Proficiency
Precision of Single-Skill Math CBM Time-Series Data: The Effect of Probe Stratification and Set Size
Solomon, Benjamin G.; Payne, Lexy L.; Campana, Kayla V.; Marr, Erin A.; Battista, Carmela; Silva, Alex; Dawes, Jillian M. – Journal of Psychoeducational Assessment, 2020
Comparatively little research exists on single-skill math (SSM) curriculum-based measurements (CBMs) for the purpose of monitoring growth, as may be done in practice or when monitoring intervention effectiveness within group or single-case research. Therefore, we examined a common variant of SSM-CBM: 1 digit × 1 digit multiplication. Reflecting…
Descriptors: Curriculum Based Assessment, Mathematics Tests, Mathematics Skills, Multiplication
Lee, Yi-Hsuan; Zhang, Jinming – International Journal of Testing, 2017
Simulations were conducted to examine the effect of differential item functioning (DIF) on measurement consequences such as total scores, item response theory (IRT) ability estimates, and test reliability in terms of the ratio of true-score variance to observed-score variance and the standard error of estimation for the IRT ability parameter. The…
Descriptors: Test Bias, Test Reliability, Performance, Scores
Sengul Avsar, Asiye; Tavsancil, Ezel – Educational Sciences: Theory and Practice, 2017
This study analysed polytomous items' psychometric properties according to nonparametric item response theory (NIRT) models. Thus, simulated datasets--three different test lengths (10, 20 and 30 items), three sample distributions (normal, right and left skewed) and three samples sizes (100, 250 and 500)--were generated by conducting 20…
Descriptors: Test Items, Psychometrics, Nonparametric Statistics, Item Response Theory
Yao, Lihua – Applied Psychological Measurement, 2013
Through simulated data, five multidimensional computerized adaptive testing (MCAT) selection procedures with varying test lengths are examined and compared using different stopping rules. Fixed item exposure rates are used for all the items, and the Priority Index (PI) method is used for the content constraints. Two stopping rules, standard error…
Descriptors: Computer Assisted Testing, Adaptive Testing, Test Items, Selection
Yao, Lihua – Psychometrika, 2012
Multidimensional computer adaptive testing (MCAT) can provide higher precision and reliability or reduce test length when compared with unidimensional CAT or with the paper-and-pencil test. This study compared five item selection procedures in the MCAT framework for both domain scores and overall scores through simulation by varying the structure…
Descriptors: Item Banks, Test Length, Simulation, Adaptive Testing
Rotou, Ourania; Patsula, Liane; Steffen, Manfred; Rizavi, Saba – ETS Research Report Series, 2007
Traditionally, the fixed-length linear paper-and-pencil (P&P) mode of administration has been the standard method of test delivery. With the advancement of technology, however, the popularity of administering tests using adaptive methods like computerized adaptive testing (CAT) and multistage testing (MST) has grown in the field of measurement…
Descriptors: Comparative Analysis, Test Format, Computer Assisted Testing, Models

Cureton, Edward E.; And Others – Educational and Psychological Measurement, 1973
Study based on F. M. Lord's arguments in 1957 and 1959 that tests of the same length do have the same standard error of measurement. (CB)
Descriptors: Error of Measurement, Statistical Analysis, Test Interpretation, Test Length

Allison, Paul A. – Psychometrika, 1976
A direct proof is given for the generalized Spearman-Brown formula for any real multiple of test length. (Author)
Descriptors: Correlation, Error of Measurement, Raw Scores, Test Length

Axelrod, Bradley N.; And Others – Psychological Assessment, 1996
The calculations of D. Schretlen, R. H. B. Benedict, and J. H. Bobholz for the reliabilities of a short form of the Wechsler Adult Intelligence Scale--Revised (WAIS-R) (1994) consistently overestimated the values. More accurate values are provided for the WAIS--R and a seven-subtest short form. (SLD)
Descriptors: Error Correction, Error of Measurement, Estimation (Mathematics), Intelligence Tests

Kristof, Walter – Psychometrika, 1971
Descriptors: Cognitive Measurement, Error of Measurement, Mathematical Models, Psychological Testing

Gilmer, Jerry S.; Feldt, Leonard S. – 1982
The Feldt-Gilmer congeneric reliability coefficients make it possible to estimate the reliability of a test composed of parts of unequal, unknown length. The approximate standard errors of the Feldt-Gilmer coefficients are derived via a method using the multivariate Taylor's expansion. Monte Carlo simulation is employed to corroborate the…
Descriptors: Educational Testing, Error of Measurement, Mathematical Formulas, Mathematical Models
Multiple Choice and True/False Tests: Reliability Measures and Some Implications of Negative Marking
Burton, Richard F. – Assessment & Evaluation in Higher Education, 2004
The standard error of measurement usefully provides confidence limits for scores in a given test, but is it possible to quantify the reliability of a test with just a single number that allows comparison of tests of different format? Reliability coefficients do not do this, being dependent on the spread of examinee attainment. Better in this…
Descriptors: Multiple Choice Tests, Error of Measurement, Test Reliability, Test Items
Haladyna, Tom; Roid, Gale – 1981
Two approaches to criterion-referenced test construction are compared. Classical test theory is based on the practice of random sampling from a well-defined domain of test items; latent trait theory suggests that the difficulty of the items should be matched to the achievement level of the student. In addition to these two methods of test…
Descriptors: Criterion Referenced Tests, Error of Measurement, Latent Trait Theory, Test Construction
Previous Page | Next Page »
Pages: 1 | 2