Publication Date
| In 2026 | 0 |
| Since 2025 | 18 |
| Since 2022 (last 5 years) | 66 |
| Since 2017 (last 10 years) | 165 |
| Since 2007 (last 20 years) | 324 |
Descriptor
Source
Author
| Hambleton, Ronald K. | 15 |
| Wang, Wen-Chung | 9 |
| Livingston, Samuel A. | 6 |
| Sijtsma, Klaas | 6 |
| Wainer, Howard | 6 |
| Weiss, David J. | 6 |
| Wilcox, Rand R. | 6 |
| Cheng, Ying | 5 |
| Gessaroli, Marc E. | 5 |
| Lee, Won-Chan | 5 |
| Lewis, Charles | 5 |
| More ▼ | |
Publication Type
Education Level
Location
| Turkey | 8 |
| Australia | 7 |
| Canada | 7 |
| China | 5 |
| Netherlands | 5 |
| Japan | 4 |
| Taiwan | 4 |
| United Kingdom | 4 |
| Germany | 3 |
| Michigan | 3 |
| Singapore | 3 |
| More ▼ | |
Laws, Policies, & Programs
| Americans with Disabilities… | 1 |
| Equal Access | 1 |
| Job Training Partnership Act… | 1 |
| Race to the Top | 1 |
| Rehabilitation Act 1973… | 1 |
Assessments and Surveys
What Works Clearinghouse Rating
Peer reviewedKafry, Ditsa; And Others – Applied Psychological Measurement, 1979
A series of behavioral expectation scale applications were analyzed in an attempt to point out an appropriate number of dimensions to be included in such studies. Results reflected the problems of dimension interdependence when the number of dimensions exceeds nine. (Author/JKS)
Descriptors: Behavior Rating Scales, Expectation, Factor Analysis, Higher Education
Peer reviewedSher, Kenneth J.; And Others – Psychological Assessment, 1995
Interrelated analyses were conducted with more than 4,000 college students to examine the reliability and validity of the Tridimensional Personality Questionnaire (TPQ) and to develop and validate a short version of the scale. Results provide moderate support for the reliability and validity of both the TPQ and the short form. (SLD)
Descriptors: College Students, Factor Analysis, Higher Education, Personality Assessment
Peer reviewedDeville, Craig; O'Neill, Thomas; Wright, Benjamin D.; Woodcock, Richard W.; Munoz-Sandoval, Ana; Gershon, Richard C.; Bergstrom, Betty – Popular Measurement, 1998
Articles in this special section consider (1) flow in test taking (Craig Deville); (2) testwiseness (Thomas O'Neill); (3) test length (Benjamin Wright); (4) cross-language test equating (Richard W. Woodcock and Ana Munoz-Sandoval); (5) computer-assisted testing and testwiseness (Richard Gershon and Betty Bergstrom); and (6) Web-enhanced testing…
Descriptors: Computer Assisted Testing, Educational Testing, Equated Scores, Measurement Techniques
Multiple Choice and True/False Tests: Reliability Measures and Some Implications of Negative Marking
Burton, Richard F. – Assessment & Evaluation in Higher Education, 2004
The standard error of measurement usefully provides confidence limits for scores in a given test, but is it possible to quantify the reliability of a test with just a single number that allows comparison of tests of different format? Reliability coefficients do not do this, being dependent on the spread of examinee attainment. Better in this…
Descriptors: Multiple Choice Tests, Error of Measurement, Test Reliability, Test Items
Wang, Xiang Bo – College Board, 2007
This research examines the effect of increased testing time by comparing the four performance indices of randomly equivalent examinee subpopulations on sections of similar content and difficulty administered at different times on three SAT administrations. A variety of analyses were used in this study and found no evidence that the current SAT…
Descriptors: College Entrance Examinations, Thinking Skills, High School Students, Test Length
De Champlain, Andre; Gessaroli, Marc E. – 1996
The use of indices and statistics based on nonlinear factor analysis (NLFA) has become increasingly popular as a means of assessing the dimensionality of an item response matrix. Although the indices and statistics currently available to the practitioner have been shown to be useful and accurate in many testing situations, few studies have…
Descriptors: Adaptive Testing, Chi Square, Computer Assisted Testing, Factor Analysis
Ankenmann, Robert D.; Stone, Clement A. – 1992
Effects of test length, sample size, and assumed ability distribution were investigated in a multiple replication Monte Carlo study under the 1-parameter (1P) and 2-parameter (2P) logistic graded model with five score levels. Accuracy and variability of item parameter and ability estimates were examined. Monte Carlo methods were used to evaluate…
Descriptors: Computer Simulation, Estimation (Mathematics), Item Bias, Mathematical Models
Schumacker, Randall E.; And Others – 1994
Rasch between and total weighted and unweighted fit statistics were compared using varying test lengths and sample sizes. Two test lengths (20 and 50 items) and three sample sizes (150, 500, and 1,000 were crossed. Each of the six combinations were replicated 100 times. In addition, power comparisons were made. Results indicated that there were no…
Descriptors: Comparative Analysis, Goodness of Fit, Item Response Theory, Power (Statistics)
Kunce, Charles S.; Arbet, Scott E. – 1994
The National Conference of Bar Examiners commissioned American College Testing, Inc., to help them in the development and evaluation of a performance test for use in bar admissions decisions. Because it was recognized that candidate perceptions would provide valuable information, a candidate-perception questionnaire was developed to be…
Descriptors: Attitudes, Demography, Languages, Lawyers
Haladyna, Tom; Roid, Gale – 1981
Two approaches to criterion-referenced test construction are compared. Classical test theory is based on the practice of random sampling from a well-defined domain of test items; latent trait theory suggests that the difficulty of the items should be matched to the achievement level of the student. In addition to these two methods of test…
Descriptors: Criterion Referenced Tests, Error of Measurement, Latent Trait Theory, Test Construction
Myers, Charles T. – 1978
The viewpoint is expressed that adding to test reliability by either selecting a more homogeneous set of items, restricting the range of item difficulty as closely as possible to the most efficient level, or increasing the number of items will not add to test validity and that there is considerable danger that efforts to increase reliability may…
Descriptors: Achievement Tests, Item Analysis, Multiple Choice Tests, Test Construction
Saunders, Joseph C.; Huynh, Huynh – 1980
In most reliability studies, the precision of a reliability estimate varies inversely with the number of examinees (sample size). Thus, to achieve a given level of accuracy, some minimum sample size is required. An approximation for this minimum size may be made if some reasonable assumptions regarding the mean and standard deviation of the test…
Descriptors: Cutting Scores, Difficulty Level, Error of Measurement, Mastery Tests
Harris, Dickie A.; Penell, Roger J. – 1977
This study used a series of simulations to answer questions about the efficacy of adaptive testing raised by empirical studies. The first study showed that for reasonable high entry points, parameters estimated from paper-and-pencil test protocols cross-validated remarkably well to groups actually tested at a computer terminal. This suggested that…
Descriptors: Adaptive Testing, Computer Assisted Testing, Cost Effectiveness, Difficulty Level
Peer reviewedFeild, Hubert S.; And Others – Educational and Psychological Measurement, 1978
Computerized answer sheets in mail surveys are examined for their effects on rate of return and response bias. Results of an empirical study of job satisfaction suggested that computerized answer sheets may be used in mail surveys without significantly affecting rate of return or producing response bias. (Author/JKS)
Descriptors: Answer Sheets, City Government, Computers, Cost Effectiveness
Peer reviewedHambleton, Ronald K.; De Gruijter, Dato N. M. – Journal of Educational Measurement, 1983
Addressing the shortcomings of classical item statistics for selecting criterion-referenced test items, this paper describes an optimal item selection procedure utilizing item response theory (IRT) and offers examples in which random selection and optimal item selection methods are compared. Theoretical advantages of optimal selection based upon…
Descriptors: Criterion Referenced Tests, Cutting Scores, Item Banks, Latent Trait Theory

Direct link
