Publication Date
| In 2026 | 0 |
| Since 2025 | 220 |
| Since 2022 (last 5 years) | 1089 |
| Since 2017 (last 10 years) | 2599 |
| Since 2007 (last 20 years) | 4960 |
Descriptor
Source
Author
Publication Type
Education Level
Audience
| Practitioners | 653 |
| Teachers | 563 |
| Researchers | 250 |
| Students | 201 |
| Administrators | 81 |
| Policymakers | 22 |
| Parents | 17 |
| Counselors | 8 |
| Community | 7 |
| Support Staff | 3 |
| Media Staff | 1 |
| More ▼ | |
Location
| Turkey | 226 |
| Canada | 223 |
| Australia | 155 |
| Germany | 116 |
| United States | 99 |
| China | 90 |
| Florida | 86 |
| Indonesia | 82 |
| Taiwan | 78 |
| United Kingdom | 73 |
| California | 66 |
| More ▼ | |
Laws, Policies, & Programs
Assessments and Surveys
What Works Clearinghouse Rating
| Meets WWC Standards without Reservations | 4 |
| Meets WWC Standards with or without Reservations | 4 |
| Does not meet standards | 1 |
Frey, Andreas; Carstensen, Claus H. – Measurement: Interdisciplinary Research and Perspectives, 2009
On a general level, the objective of diagnostic classifications models (DCMs) lies in a classification of individuals regarding multiple latent skills. In this article, the authors show that this objective can be achieved by multidimensional adaptive testing (MAT) as well. The authors discuss whether or not the restricted applicability of DCMs can…
Descriptors: Adaptive Testing, Test Items, Classification, Psychometrics
Wainer, Howard; Thissen, David – 1992
If examinees are permitted to choose to answer a subset of the questions on a test, just knowing which questions were chosen can provide a measure of proficiency that may be as reliable as would have been obtained from the test graded traditionally. This new method of scoring is much less time consuming and expensive for both the examinee and the…
Descriptors: Adaptive Testing, Cost Effectiveness, Responses, Scoring
PDF pending restorationKramer, Gene A. – 1995
The present study is designed to cross-validate the findings of an earlier component analysis of orthographic-projection, spatial-ability items. The earlier research identified four design components that contribute to the difficulty of orthographic-projection items. The research found that increasing Rasch item difficulties on component…
Descriptors: Difficulty Level, Item Response Theory, Spatial Ability, Test Construction
Content Characteristics of GRE Analytical Reasoning Items. GRE Board Professional Report No. 84-14P.
Chalifour, Clark; Powers, Donald E. – 1988
In actual test development practice, the number of test items that must be developed and pretested is typically greater, and sometimes much greater, than the number eventually judged suitable for use in operational test forms. This has proven to be especially true for analytical reasoning items, which currently form the bulk of the analytical…
Descriptors: Coding, Difficulty Level, Higher Education, Test Construction
Dorans, Neil J.; Lawrence, Ida M. – 1988
A procedure for checking the score equivalence of nearly identical editions of a test is described. The procedure employs the standard error of equating (SEE) and utilizes graphical representation of score conversion deviation from the identity function in standard error units. Two illustrations of the procedure involving Scholastic Aptitude Test…
Descriptors: Equated Scores, Error of Measurement, Test Construction, Test Format
Barnette, J. Jackson – 1997
The controversy regarding reverse or negatively-worded survey stems has been around for several decades. The practice has been used to guard against acquiescent or response set behaviors. A 20-item, 5-point Likert item survey was designed and the stems and response sets were varied in a 2 by 3 design. One independent variable was type of item…
Descriptors: Likert Scales, Reliability, Responses, Statistical Analysis
van der Linden, Wim J.; Vos, Hans J. – 1994
This paper presents some Bayesian theories of simultaneous optimization of decision rules for test-based decisions. Simultaneous decision making arises when an institution has to make a series of selection, placement, or mastery decisions with respect to subjects from a population. An obvious example is the use of individualized instruction in…
Descriptors: Bayesian Statistics, Decision Making, Foreign Countries, Scores
Oshima, T. C.; Davey, T. C. – 1994
This paper evaluated multidimensional linking procedures with which multidimensional test data from two separate calibrations were put on a common scale. Data were simulated with known ability distributions varying on two factors which made linking necessary: mean vector differences and variance-covariance (v-c) matrix differences. After the…
Descriptors: Ability, Estimation (Mathematics), Evaluation Methods, Matrices
Plake, Barbara S.; Giraud, Gerald – 1998
In the traditional Angoff Standard Setting Method, experts are instructed to predict the possibility that a randomly selected, hypothetical minimally competent candidate will be able to answer each multiple choice question in the test correctly. These item performance estimates are averaged across panelists and aggregated to determine the minimum…
Descriptors: Estimation (Mathematics), Evaluators, Performance Factors, Standard Setting (Scoring)
Lee, Guemin; Kolen, Michael J.; Frisbie, David A.; Ankenmann, Robert D. – 1998
Item response models can be applied in many test equating situations by making strong statistical assumptions. Thus, studying the robustness of the models to violations of the assumptions and investigating model-data fit are essential in all item response theory (IRT) equating applications (M. Kolen and R. Brennan, 1995). Previous studies dealing…
Descriptors: Equated Scores, Item Response Theory, Robustness (Statistics), Tables (Data)
Cohen, Allan S.; Kim, Seock-Ho; Wollack, James A. – 1998
This paper provides a review of procedures for detection of differential item functioning (DIF) for item response theory (IRT) and observed score methods for the graded response model. In addition, data from a test anxiety scale were analyzed to examine the congruence among these procedures. Data from Nasser, Takahashi, and Benson (1997) were…
Descriptors: Identification, Item Bias, Item Response Theory, Scores
Spray, Judith A.; And Others – 1990
Test data generated according to two different multidimensional item response theory (IRT) models were compared at both the item response level and the test score level to determine whether measurable differences between the models could be detected when the data sets were constrained to be equivalent in terms of item "p"-values. The…
Descriptors: Ability, Comparative Analysis, Item Response Theory, Mathematical Models
Zhang, Jinming; Chang, Hua-Hua – ETS Research Report Series, 2005
This paper compares the use of multiple pools versus a single pool with respect to test security against large-scale item sharing among some examinees in a computer-based test, under the assumption that a randomized item selection method is used. It characterizes the conditions under which employing multiple pools is better than using a single…
Descriptors: Comparative Analysis, Test Items, Item Banks, Computer Assisted Testing
Baker, Eva; Polin, Linda – 1978
The validity studies planned for the Test Design activities deal primarily with the appropriateness of items generated for a domain. Previous exploratory work in the field related to overall test content appropriateness ratings has not been satisfactory. Studies which are solely based on correlational data suffer from confounding with…
Descriptors: Questionnaires, Rating Scales, Test Construction, Test Format
Peer reviewedBerk, Ronald A. – Educational and Psychological Measurement, 1978
Three formulae developed to correct item-total correlations for spuriousness were evaluated. Relationships among corrected, uncorrected, and item-remainder correlations were determined by computing sets of mean, minimum, and maximum deviation coefficients and Spearman rank correlations for nine test lengths. (Author/JKS)
Descriptors: Correlation, Intermediate Grades, Item Analysis, Test Construction

Direct link
