Publication Date
In 2025 | 0 |
Since 2024 | 0 |
Since 2021 (last 5 years) | 0 |
Since 2016 (last 10 years) | 1 |
Since 2006 (last 20 years) | 8 |
Descriptor
Test Format | 38 |
Test Theory | 38 |
Test Items | 13 |
Higher Education | 12 |
Test Construction | 11 |
Foreign Countries | 10 |
Test Validity | 9 |
Multiple Choice Tests | 7 |
Item Response Theory | 6 |
Statistical Analysis | 6 |
Student Evaluation | 6 |
More ▼ |
Source
Author
Publication Type
Journal Articles | 38 |
Reports - Research | 18 |
Reports - Evaluative | 7 |
Reports - Descriptive | 6 |
Information Analyses | 4 |
Opinion Papers | 2 |
Guides - Non-Classroom | 1 |
Numerical/Quantitative Data | 1 |
Education Level
Higher Education | 2 |
Elementary Secondary Education | 1 |
Postsecondary Education | 1 |
Audience
Practitioners | 1 |
Location
Canada | 3 |
United Kingdom | 2 |
Australia | 1 |
Israel | 1 |
Luxembourg | 1 |
United Kingdom (England) | 1 |
United States | 1 |
West Germany | 1 |
Laws, Policies, & Programs
Individuals with Disabilities… | 1 |
No Child Left Behind Act 2001 | 1 |
Assessments and Surveys
Armed Services Vocational… | 1 |
Comprehensive Tests of Basic… | 1 |
Defining Issues Test | 1 |
SAT (College Admission Test) | 1 |
Stanford Achievement Tests | 1 |
Wisconsin Card Sorting Test | 1 |
What Works Clearinghouse Rating
Tao, Wei; Cao, Yi – Applied Measurement in Education, 2016
Current procedures for equating number-correct scores using traditional item response theory (IRT) methods assume local independence. However, when tests are constructed using testlets, one concern is the violation of the local item independence assumption. The testlet response theory (TRT) model is one way to accommodate local item dependence.…
Descriptors: Item Response Theory, Equated Scores, Test Format, Models
Steinmetz, Jean-Paul; Brunner, Martin; Loarer, Even; Houssemand, Claude – Psychological Assessment, 2010
The Wisconsin Card Sorting Test (WCST) assesses executive and frontal lobe function and can be administered manually or by computer. Despite the widespread application of the 2 versions, the psychometric equivalence of their scores has rarely been evaluated and only a limited set of criteria has been considered. The present experimental study (N =…
Descriptors: Computer Assisted Testing, Psychometrics, Test Theory, Scores
van der Linden, Wim J. – Measurement: Interdisciplinary Research and Perspectives, 2010
The traditional way of equating the scores on a new test form X to those on an old form Y is equipercentile equating for a population of examinees. Because the population is likely to change between the two administrations, a popular approach is to equate for a "synthetic population." The authors of the articles in this issue of the…
Descriptors: Test Format, Equated Scores, Population Distribution, Population Trends
Brooks, Lindsay – Language Testing, 2009
This study, framed within sociocultural theory, examines the interaction of adult ESL test-takers in two tests of oral proficiency: one in which they interacted with an examiner (the individual format) and one in which they interacted with another student (the paired format). The data for the eight pairs in this study were drawn from a larger…
Descriptors: Testing, Rating Scales, Program Effectiveness, Interaction

Holland, Paul W.; Hoskens, Machteld – Psychometrika, 2003
Gives an account of classical test theory that shows how it can be viewed as a mean and variance approximation to a general version of item response theory and then shows how this approach can give insight into predicting the true score of a test and the true scores of tests not necessarily parallel to the given test. (SLD)
Descriptors: Prediction, Test Format, Test Theory, True Scores

Kolstad, Rosemarie K.; And Others – Journal of Research and Development in Education, 1985
Multiple choice questions that could logically provide two or more choices block the expression of judgment, thereby suppressing measurement of learning and failing to provide feedback to students and teachers. This study compares the effects of content identical multiple choice and multiple true false items on students' decision. (MT)
Descriptors: Evaluation Methods, Higher Education, Knowledge Level, Test Format

Chambers, William V. – Social Behavior and Personality, 1985
Personal construct psychologists have suggested various psychological functions explain differences in the stability of constructs. Among these functions are constellatory and loose construction. This paper argues that measurement error is a more parsimonious explanation of the differences in construct stability reported in these studies. (Author)
Descriptors: Error of Measurement, Test Construction, Test Format, Test Reliability

Pumfrey, Peter D. – Journal of Research in Reading, 1987
Discusses, for the benefit of research workers and other test users, the ongoing controversy concerning the relative merits of conventional test theory and Rasch scaling in the construction of reading tests. Concludes that a great deal of further research is required to see whether these approaches are educationally valid. (JD)
Descriptors: Reading Research, Reading Tests, Test Construction, Test Format

Adler, Nurit; Guttman, Ruth – Educational and Psychological Measurement, 1982
Thirteen ability tests were administered as defined within a mapping sentence containing four content facets: rule type, expression mode, language of communication and dimensionality of portrayed object. Smallest Space Analysis of intercorrelations among test scores showed the radex structure of the two-dimensional space conformed to the…
Descriptors: Content Analysis, Factor Structure, Intelligence Tests, Scores

Bieliauskas, Vytautas J.; Farragher, John – Journal of Clinical Psychology, 1983
Administered the House-Tree-Person test to male college students (N=24) to examine the effects of varying the size of the drawing form on the scores. Results suggested that use of the drawing sheet did not have a significant influence upon the quantitative aspects of the drawing. (LLL)
Descriptors: College Students, Higher Education, Intelligence Tests, Males

Little, Roderick J. A.; Rubin, Donald B. – Journal of Educational and Behavioral Statistics, 1994
Equating a new standard test to an old reference test is considered when samples for equating are not randomly selected from the target population of test takers, identifying two problems from equating from biased samples. An empirical example with data from the Armed Services Vocational Aptitude Battery illustrates the approach. (SLD)
Descriptors: Equated Scores, Military Personnel, Sampling, Statistical Analysis

Hawkins, Katherine W. – Communication Education, 1987
Provides a brief, nontechnical overview of latent trait models and argues for the preferability of these models (particularly the Rasch logistic model) over classical test models. Offers an example application of the Rasch model and discusses implications for the use of latent trait models for communication educators. (AEW)
Descriptors: Higher Education, Journalism Education, Latent Trait Theory, Teaching Methods

Murphy, R. J. L. – British Journal of Educational Psychology, 1982
To study sex differences in test performance, the performance of males and females on 16 General Certificate of Education exams was analyzed in England. Results show that males perform better on objective tests than females. (Author/JJD)
Descriptors: Achievement, Foreign Countries, Objective Tests, Prediction

Haladyna, Thomas M.; Downing, Steven M. – Applied Measurement in Education, 1989
A taxonomy of 43 rules for writing multiple-choice test items is presented, based on a consensus of 46 textbooks. These guidelines are presented as complete and authoritative, with solid consensus apparent for 33 of the rules. Four rules lack consensus, and 5 rules were cited fewer than 10 times. (SLD)
Descriptors: Classification, Interrater Reliability, Multiple Choice Tests, Objective Tests

Bell, Richard; Lumsden, James – Applied Psychological Measurement, 1980
The effect of test length on predictive validity is examined empirically. For four tests, the curve of validity against test length had a very gentle slope for the longer tests and all tests could be reduced by more than 60 percent without appreciable decreases in validity. (Author/BW)
Descriptors: Foreign Countries, High School Seniors, High Schools, Mathematical Models