Publication Date
In 2025 | 3 |
Since 2024 | 12 |
Since 2021 (last 5 years) | 41 |
Since 2016 (last 10 years) | 126 |
Since 2006 (last 20 years) | 395 |
Descriptor
Test Theory | 1161 |
Test Items | 261 |
Test Reliability | 252 |
Test Construction | 245 |
Test Validity | 245 |
Psychometrics | 181 |
Scores | 176 |
Item Response Theory | 165 |
Foreign Countries | 159 |
Item Analysis | 141 |
Statistical Analysis | 134 |
More ▼ |
Source
Author
Publication Type
Education Level
Location
United States | 17 |
United Kingdom (England) | 15 |
Canada | 14 |
Australia | 13 |
Turkey | 12 |
Sweden | 8 |
United Kingdom | 8 |
Netherlands | 7 |
Texas | 7 |
New York | 6 |
Taiwan | 6 |
More ▼ |
Laws, Policies, & Programs
No Child Left Behind Act 2001 | 4 |
Elementary and Secondary… | 3 |
Individuals with Disabilities… | 3 |
Assessments and Surveys
What Works Clearinghouse Rating
Vale, C. David; And Others – 1981
A simulation study to determine appropriate linking methods for adaptive testing items was designed. Three basic data sets for responses were created. These were randomly sampled, systematically sampled, and selected data sets. The evaluative criteria used were fidelity of parameter estimation, asymptotic ability estimates, root-mean-square error…
Descriptors: Adaptive Testing, Aptitude Tests, Armed Forces, Bayesian Statistics
Hambleton, Ronald K.; Cook, Linda L. – 1978
The purpose of the present research was to study, systematically, the "goodness-of-fit" of the one-, two-, and three-parameter logistic models. We studied, using computer-simulated test data, the effects of four variables: variation in item discrimination parameters, the average value of the pseudo-chance level parameters, test length,…
Descriptors: Career Development, Difficulty Level, Goodness of Fit, Item Analysis
Dieterich, Thomas G.; Freeman, Cecilia – 1979
Part One of this guide explores issues in English proficiency testing. Tests are discussed in terms of the aspect of language tested, and of different kinds of test tasks. The following kinds of test task defects are treated: (1) tests that required literacy skills, (2) tasks that reduce to a vocabulary test, and (3) errant notions of linguistic…
Descriptors: English (Second Language), Evaluation, Item Analysis, Language Proficiency
Strathe, Marlene I. – 1978
The results of a mailed questionnaire survey are presented showing the opinions of a sample of teachers concerning the importance to them of 41 competencies related to measurement and evaluation. Out of a sample of 100 elementary school teachers, 50 junior high school teachers, and 100 senior high school teachers, 58 percent responded to the…
Descriptors: Communication Skills, Data Analysis, Elementary School Teachers, Elementary Secondary Education

Lord, Frederic M. – Applied Psychological Measurement, 1977
Under given conditions, conventional testing and computer-generated repeatable testing (CGRT) are equally effective for estimating examinee ability; CGRT is more effective for estimating the mean ability level of a group and less effective for estimating ability differences among individuals. These conclusion are drawn from domain-referenced test…
Descriptors: Career Development, Computer Assisted Testing, Difficulty Level, Group Norms

Linn, Robert L.; Drasgow, Fritz – Educational Measurement: Issues and Practice, 1987
This article discusses the application of the Golden Rule procedure to items of the Scholastic Aptitude Test. Using item response theory, the analyses indicate that the Golden Rule procedures are ineffective in detecting biased items and may undermine the reliability and validity of tests. (Author/JAZ)
Descriptors: College Entrance Examinations, Difficulty Level, Item Analysis, Latent Trait Theory

Harrison, David A. – Journal of Educational Statistics, 1986
Multidimensional item response data were created. The strength of a general factor, the number of common factors, the distribution of items loadingon common factors, and the number of items in simulated tests were manipulated. LOGIST effectively recovered both item and trait parameters in nearly all of the experimental conditions. (Author/JAZ)
Descriptors: Adaptive Testing, Computer Assisted Testing, Computer Simulation, Correlation
Haley, Kathleen – 1998
A study was proposed to determine to what extent a hierarchical structure exists in music and in tests used to measure music ability. The first research question was whether items in the Watkins-Farnum Performance Scale (J. Watkins and S. Farnum, 1954) (WFPS) form a hierarchy, so that early exercises (bars played) are generally easier than later…
Descriptors: Difficulty Level, Evaluation Methods, Intermediate Grades, Junior High Schools
Takala, Sauli – 1998
This paper discusses recent developments in language testing. It begins with a review of the traditional criteria that are applied to all measurement and outlines recent emphases that derive from the expanding range of stakeholders. Drawing on Alderson's seminal work, criteria are presented for evaluating communicative language tests. Developments…
Descriptors: Alternative Assessment, Communicative Competence (Languages), Comparative Analysis, Evaluation Criteria

Gutkin, Terry B.; Reynolds, Cecil R. – Journal of Educational Psychology, 1981
To test the validity of the Wechsler Intelligence Scale for Children-Revised (WISC-R) for minority groups, factorial similarity across race was investigated with separate principal-factor analyses for White and Black children from the nationally representative WISC-R standardization sample. On every measure, the White and Black groups were highly…
Descriptors: Analysis of Variance, Black Youth, Elementary Secondary Education, Factor Analysis

Lohman, David F. – International Journal of Educational Research, 1997
A look at the history of intelligence testing suggests that those most closely allied with intelligence testing were often least able to see the larger issues. Input is needed from those who have examined broader currents in the history and sociology of ideas. New ideas must be cultivated to avoid redundancy in the field. (SLD)
Descriptors: Educational History, Educational Testing, Intelligence Tests, Political Influences

Nist, Sherrie L.; And Others – Reading Research and Instruction, 1990
Investigates the utility and predictive validity of the Learning and Study Strategies Inventory (LASSI) as a means of measuring college students' cognitive and affective growth following a study strategies course. Finds cognitive and affective growth in both regularly admitted and developmental studies students. Finds that LASSI cannot yet be used…
Descriptors: Affective Measures, Cognitive Measurement, College Students, Developmental Studies Programs

Bloom, Benjamin S. – Teaching Education, 1988
A review of the development and innovation of testing at the higher education level focuses on Ralph Tyler's testing emphasis on the educational process and the objectives of instruction, a precursor to the current trend of integrating formative evaluation into the curriculum development process. (CB)
Descriptors: Achievement Tests, Evaluation Methods, Formative Evaluation, Higher Education
Abedi, Jamal; Bruno, James – Journal of Computer-Based Instruction, 1989
Reports the results of several test-reliability experiments which compared a modified confidence weighted-admissible probability measurement (MCW-APM) with conventional forced choice or binary type (R-W) test scoring methods. Psychometric properties using G theory and conventional correlational methods are examined, and their implications for…
Descriptors: Ability Grouping, Analysis of Variance, Computer Assisted Testing, Correlation

Glaser, Robert – Educational Measurement: Issues and Practice, 1994
Some unfinished issues relating to achievement test theory that seemed implicit in the basic idea of criterion-referenced testing are reviewed, recognizing their importance in current studies of authentic assessment and performance-based tests. The future of performance-based evaluation is explored. (SLD)
Descriptors: Academic Achievement, Achievement Tests, Criterion Referenced Tests, Educational History