Publication Date
In 2025 | 0 |
Since 2024 | 0 |
Since 2021 (last 5 years) | 0 |
Since 2016 (last 10 years) | 1 |
Since 2006 (last 20 years) | 2 |
Descriptor
Comparative Analysis | 44 |
Test Reliability | 44 |
Testing Problems | 44 |
Test Validity | 22 |
Test Construction | 13 |
Higher Education | 10 |
Testing | 9 |
Scores | 7 |
Achievement Tests | 6 |
Evaluation Methods | 6 |
Measurement Techniques | 6 |
More ▼ |
Source
Author
Publication Type
Education Level
Location
Illinois | 1 |
Israel | 1 |
Pennsylvania | 1 |
Laws, Policies, & Programs
Assessments and Surveys
What Works Clearinghouse Rating
Isbell, Dan; Winke, Paula – Language Testing, 2019
The American Council on the Teaching of Foreign Languages (ACTFL) oral proficiency interview -- computer (OPIc) testing system represents an ambitious effort in language assessment: Assessing oral proficiency in over a dozen languages, on the same scale, from virtually anywhere at any time. Especially for users in contexts where multiple foreign…
Descriptors: Oral Language, Language Tests, Language Proficiency, Second Language Learning

Green, Samuel B. – Educational and Psychological Measurement, 1981
The proportion of agreement, G, and kappa indexes are shown to differ in how they correct for chance agreements between two observers. On the basis of the findings, it is suggested that no single agreement index is appropriate for all sets of data. (Author/BW)
Descriptors: Comparative Analysis, Measurement Techniques, Test Reliability, Testing Problems
Thrash, Susan K.; Porter, Andrew C. – 1974
The purpose of this paper is to prove that one currently recommended method of obtaining the reliability of an instrument defined on a population of aggregate units is invalid. This method randomly splits the aggregate into two halves, correlates the two half unit scores by a Pearson product moment correlation coefficient, and corrects the…
Descriptors: Comparative Analysis, Correlation, Measurement Techniques, Sampling

Wagner, Edwin E.; And Others – Educational and Psychological Measurement, 1990
Maximized correlation as an internal reliability estimate for tests with few items was investigated. An actual sampling distribution of maximum correlation--"r" max--was empirically derived from 100 samples of 50 cases each from Rorschach test data and compared with those of alpha and an odd/even split, using 2,020 Rorschach protocols.…
Descriptors: Comparative Analysis, Correlation, Estimation (Mathematics), Sample Size

Rubin, Donald B.; Thayer, Dorothy – Psychometrika, 1978
A procedure is developed for estimating correlations among new tests when non-overlapping sub-samples each are administered a different new test and all sub-samples are administered a set of standard tests. (JKS)
Descriptors: Comparative Analysis, Correlation, Measurement, Standardized Tests
Nevo, Barukh – Measurement and Evaluation in Guidance, 1976
Freshmen (N=202) took two batteries of aptitude tests 10 months apart. Six pairs of tests were studied. Two pairs were identical, two were parallel, and two were completely different. This design made it possible to separate three components of practice: (a) general test sophistication, (b) specific practice effect, and (c) item familiarization.…
Descriptors: Aptitude Tests, College Freshmen, Comparative Analysis, Group Testing
Comparison of Yes-No, Matched-Pairs, and All-No Scoring of a First-Grade Economics Achievement Test.
Larkins, A. Guy; Shaver, James P. – 1968
Developing practical achievement tests for use at the primary-grade level is a difficult task. Some problems encountered appear to be resolved by using verbally administered yes-no tests. But such tests are criticized as having a low reliability because they offer only two choices. Two modifications of the yes-no test have been proposed to…
Descriptors: Achievement Tests, Comparative Analysis, Primary Education, Test Construction

Coleman, Marilyn; And Others – Psychology in the Schools, 1980
The mean IQ on the Slosson Intelligence Test (SIT) was substantially higher than expected based on the earlier Peabody Picture Vocabulary Test (PPVT) scores. Sampling error and examiner error were excluded as explanations. Results suggest that the PPVT and SIT yield different scores and lack comparability. (Author)
Descriptors: Children, Comparative Analysis, Intelligence Tests, Intervention

Hunter, John E.; Cohen, Stanley H. – Psychometrika, 1974
Descriptors: Attitude Change, Attitudes, Comparative Analysis, Models
Jacko, Edward J.; Huck, Schuyler W. – 1974
The Alpert-Haber Achievement Anxiety Test was developed to measure the extent to which individuals experience test anxiety. In at least two published studies, the authors claim to have used the test when in fact the response format was changed from that used in the original instrument and the "buffer" items were omitted. To investigate…
Descriptors: Achievement Tests, Anxiety, College Students, Comparative Analysis
Halpin, Gerald; And Others – Measurement and Evaluation in Guidance, 1978
Super's Work Values Inventory is utilized in making interindividual and intraindividual comparative interpretations of work values. Internal consistency reliability coefficients for 15 scales and reliabilities of differences between scores on scales were of such a weak magnitude that caution in making interindividual and intrindividual comparisons…
Descriptors: Comparative Analysis, High School Students, Research Projects, Test Reliability
Kapes, Jerome T. – 1975
Two independent studies were conducted to investigate possible differences in General Aptitude Test Battery (GATB) aptitude M resulting from the use of different test equipment (wooden vs. plastic apparatus.) As part of a ten-year longitudinal study of Vocational Development being conducted in the Department of Vocational Education at The…
Descriptors: Aptitude Tests, Comparative Analysis, Elementary Secondary Education, Scores

Lyon, Mark A. – Journal of Learning Disabilities, 1995
This study examined differences between Wechsler Intelligence Scale for Children-Third Edition (WISC-III) and Wechsler Intelligence Scale for Children-Revised (WISC-R) scores for 40 elementary students with learning disabilities. WISC-III Full Scale, Verbal, and Performance scores were lower than comparable WISC-R scores by one-third to one-half a…
Descriptors: Comparative Analysis, Correlation, Disability Identification, Elementary Education

Shadish, William R., Jr. – Journal of Consulting and Clinical Psychology, 1980
A comparison of nonverbal with verbal clinical group interventions suggested that some traditional self-report devices show less differentiation between these two interventions than do measures of group cohesion. A strong, replicable manipulation tested these findings, which were consistent with previous research. (Author/BEF)
Descriptors: Behavior Rating Scales, Comparative Analysis, Group Therapy, Group Unity

Fuchs, Douglas; Fuchs, Lynn S. – Exceptional Children, 1989
Presented is a quantitative synthesis of examiner familiarity effects on Caucasian and minority students' test performance. Fourteen controlled studies were coded in terms of methodological quality and race-ethnicity. Caucasian students performed similarly in both familiar and unfamiliar examiner conditions, while Black and Hispanic children…
Descriptors: Blacks, Comparative Analysis, Elementary Secondary Education, Examiners