Publication Date
| In 2026 | 0 |
| Since 2025 | 200 |
| Since 2022 (last 5 years) | 1070 |
| Since 2017 (last 10 years) | 2580 |
| Since 2007 (last 20 years) | 4941 |
Descriptor
Source
Author
Publication Type
Education Level
Audience
| Practitioners | 653 |
| Teachers | 563 |
| Researchers | 250 |
| Students | 201 |
| Administrators | 81 |
| Policymakers | 22 |
| Parents | 17 |
| Counselors | 8 |
| Community | 7 |
| Support Staff | 3 |
| Media Staff | 1 |
| More ▼ | |
Location
| Turkey | 225 |
| Canada | 223 |
| Australia | 155 |
| Germany | 116 |
| United States | 99 |
| China | 90 |
| Florida | 86 |
| Indonesia | 82 |
| Taiwan | 78 |
| United Kingdom | 73 |
| California | 65 |
| More ▼ | |
Laws, Policies, & Programs
Assessments and Surveys
What Works Clearinghouse Rating
| Meets WWC Standards without Reservations | 4 |
| Meets WWC Standards with or without Reservations | 4 |
| Does not meet standards | 1 |
Peer reviewedLin, Miao-Hsiang; Hsiung, Chao A. – Psychometrika, 1994
Two simple empirical approximate Bayes estimators are introduced for estimating domain scores under binomial and hypergeometric distributions respectively. Criteria are established regarding use of these functions over maximum likelihood estimation counterparts. (SLD)
Descriptors: Adaptive Testing, Bayesian Statistics, Computation, Equations (Mathematics)
Peer reviewedHancock, Gregory R.; And Others – Educational and Psychological Measurement, 1993
Two-option multiple-choice vocabulary test items are compared with comparably written true-false test items. Results from a study with 111 high school students suggest that multiple-choice items provide a significantly more reliable measure than the true-false format. (SLD)
Descriptors: Ability, High School Students, High Schools, Objective Tests
Peer reviewedHamp-Lyons, Liz; Mathias, Sheila Prochnow – Journal of Second Language Writing, 1994
Expert judgments of prompt difficulty in essay tests were examined to discover whether they could be used at the item-writing stage of test development. Findings show that "expert judges" share considerable agreement about prompt difficulty and prompt task type, but they cannot predict which prompts will result in high or low scores for…
Descriptors: Cues, English (Second Language), Essay Tests, Language Tests
Peer reviewedMillman, Jason – Educational Measurement: Issues and Practice, 1994
The unfulfilled promise of criterion-referenced measurement is that it would permit valid inferences about what a student could and could not do. To come closest to achieving all that criterion-referenced testing originally promised, tests of higher item density, with more items per amount of domain, are required. (SLD)
Descriptors: Criterion Referenced Tests, Educational History, Inferences, Norm Referenced Tests
Peer reviewedMeijer, Rob R.; And Others – Applied Psychological Measurement, 1994
The power of the nonparametric person-fit statistic, U3, is investigated through simulations as a function of item characteristics, test characteristics, person characteristics, and the group to which examinees belong. Results suggest conditions under which relatively short tests can be used for person-fit analysis. (SLD)
Descriptors: Difficulty Level, Group Membership, Item Response Theory, Nonparametric Statistics
Peer reviewedOtter, Martha E.; And Others – Journal of Educational Measurement, 1995
The ability of 2 components, interpretation of a question and memory, to forecast the test-retest association coefficients of reading test items was studied with initial samples of 916 elementary and 949 secondary school students. For both populations, both components forecast the relative sizes of test-retest correlation coefficients. (SLD)
Descriptors: Cognitive Processes, Comprehension, Correlation, Elementary School Students
Peer reviewedHetter, Rebecca D.; And Others – Applied Psychological Measurement, 1994
Effects on computerized adaptive test score of using a paper-and-pencil (P&P) calibration to select items and estimate scores were compared with effects of using computer calibration. Results with 2,999 Navy recruits support the use of item parameters calibrated from either P&P or computer administrations. (SLD)
Descriptors: Adaptive Testing, Comparative Analysis, Computer Assisted Testing, Estimation (Mathematics)
Peer reviewedVan der Ven, Ad H. G. S. – Educational and Psychological Measurement, 1992
The dichotomous Rasch model was applied to verbal subtest scores on the Intelligence Structure Test Battery for 905 12- to 15-year-old secondary school students in the Netherlands. Results suggest that, if any factor is used to increase difficulty of items, that factor should be used on all items. (SLD)
Descriptors: Difficulty Level, Foreign Countries, Intelligence Tests, Secondary Education
Vance, Booney; Sabatino, David – Diagnostique, 1991
The issues of construct validity, predictive validity, and item content bias on the Wechsler Intelligence Scale for Children-Revised (WISC-R) are examined. The review concludes that most objective data have not supported the issue of bias of the WISC-R when used with children of different ethnic backgrounds. (JDD)
Descriptors: Construct Validity, Content Validity, Elementary Secondary Education, Ethnic Groups
Peer reviewedSireci, Stephen G.; Geisinger, Kurt F. – Applied Psychological Measurement, 1992
A new method for evaluating the content representation of a test is illustrated. Item similarity ratings were obtained from three content domain experts to assess whether ratings corresponded to item groupings specified in the test blueprint. Multidimensional scaling and cluster analysis provided substantial information about the test's content…
Descriptors: Cluster Analysis, Content Analysis, Multidimensional Scaling, Multiple Choice Tests
Peer reviewedWeir, C. J.; And Others – Reading in a Foreign Language, 1990
Presents critical analysis of an earlier article, and argues that, although the validity of the High/Low distinction is questionable, it is possible for practical testing purposes to obtain reliable judgments from properly selected and trained judges. (seven references) (GLR)
Descriptors: Evaluation Methods, Reading Comprehension, Reading Tests, Second Language Learning
Peer reviewedAckerman, Terry A. – Applied Measurement in Education, 1994
When item response data do not satisfy the unidimensionality assumption, multidimensional item response theory (MIRT) should be used to model the item-examinee interaction. This article presents and discusses MIRT analyses designed to give better insight into what individual items are measuring. (SLD)
Descriptors: Evaluation Methods, Item Response Theory, Measurement Techniques, Models
Peer reviewedRoznowski, Mary; And Others – Applied Psychological Measurement, 1991
Three heuristic methods of assessing the dimensionality of binary item pools were evaluated in a Monte Carlo investigation. The indices were based on (1) the local independence of unidimensional tests; (2) patterns of second-factor loadings derived from simplex theory; and (3) the shape of the curve of successive eigenvalues. (SLD)
Descriptors: Comparative Analysis, Computer Simulation, Correlation, Evaluation Methods
Peer reviewedWainer, Howard; And Others – Journal of Educational Measurement, 1991
A testlet is an integrated group of test items presented as a unit. The concept of testlet differential item functioning (testlet DIF) is defined, and a statistical method is presented to detect testlet DIF. Data from a testlet-based experimental version of the Scholastic Aptitude Test illustrate the methodology. (SLD)
Descriptors: College Entrance Examinations, Definitions, Graphs, Item Bias
Peer reviewedSamejima, Fumiko – Psychometrika, 1993
An approximation for the bias function of the maximum likelihood estimate of the latent trait or ability is developed for the general case where item responses are discrete, which includes the dichotomous response level, the graded response level, and the nominal response level. (SLD)
Descriptors: Ability, Equations (Mathematics), Estimation (Mathematics), Item Response Theory


