Publication Date
| In 2026 | 0 |
| Since 2025 | 215 |
| Since 2022 (last 5 years) | 1084 |
| Since 2017 (last 10 years) | 2594 |
| Since 2007 (last 20 years) | 4955 |
Descriptor
Source
Author
Publication Type
Education Level
Audience
| Practitioners | 653 |
| Teachers | 563 |
| Researchers | 250 |
| Students | 201 |
| Administrators | 81 |
| Policymakers | 22 |
| Parents | 17 |
| Counselors | 8 |
| Community | 7 |
| Support Staff | 3 |
| Media Staff | 1 |
| More ▼ | |
Location
| Turkey | 226 |
| Canada | 223 |
| Australia | 155 |
| Germany | 116 |
| United States | 99 |
| China | 90 |
| Florida | 86 |
| Indonesia | 82 |
| Taiwan | 78 |
| United Kingdom | 73 |
| California | 66 |
| More ▼ | |
Laws, Policies, & Programs
Assessments and Surveys
What Works Clearinghouse Rating
| Meets WWC Standards without Reservations | 4 |
| Meets WWC Standards with or without Reservations | 4 |
| Does not meet standards | 1 |
PDF pending restorationDimitrov, Dimiter M. – 2002
Exact formulas for classical error variance are provided for Rasch measurement with logistic distributions. An approximation formula with the normal ability distribution is also provided. With the proposed formulas, the additive contribution of individual items to the population error variance can be determined without knowledge of the other test…
Descriptors: Ability, Error of Measurement, Item Response Theory, Test Items
Deane, Paul; Sheehan, Kathleen – 2003
This paper is an exploration of the conceptual issues that have arisen in the course of building a natural language generation (NLG) system for automatic test item generation. While natural language processing techniques are applicable to general verbal items, mathematics word problems are particularly tractable targets for natural language…
Descriptors: Natural Language Processing, Semantics, Test Construction, Test Items
Lee, Guemin – 1999
Previous studies have indicated that the reliability of test scores composed of testlets is overestimated by conventional item-based reliability estimation methods (S. Sireci, D. Thissen, and H. Wainer, 1991; H. Wainer, 1995; H. Wainer and D. Thissen, 1996; G. Lee and D. Frisbie). In light of these studies, it seems reasonable to ask whether the…
Descriptors: Definitions, Error of Measurement, Estimation (Mathematics), Reliability
Lee, Guemin – 1998
The primary purpose of this study was to investigate the appropriateness and implication of incorporating a testlet definition into the estimation of the conditional standard error of measurement (SEM) for tests composed of testlets. The five conditional SEM estimation methods used in this study were classified into two categories: item-based and…
Descriptors: Definitions, Error of Measurement, Estimation (Mathematics), Reliability
Egan, Karla L.; Sireci, Stephen G.; Swaminathan, Hariharan; Sweeney, Kevin P. – 1998
The primary purpose of this study was to assess the effect of item bundling on multidimensional data. A second purpose was to compare three methods for assessing dimensionality. Eight multidimensional data sets consisting of 100 items and 1,000 examinees were simulated varying in terms of dimensionality, inter-dimensional correlation, and number…
Descriptors: Certified Public Accountants, Evaluation Methods, Licensing Examinations (Professions), Simulation
Li, Yuan H.; Lissitz, Robert W.; Yang, Yu Nu – 1999
Recent years have seen growing use of tests with mixed item formats, e.g., partly containing dichotomously scored items and partly consisting of polytomously scored items. A matching two test characteristic curves method (CCM) for placing these mixed format items on the same metric is described and evaluated in this paper under a common-item…
Descriptors: Equated Scores, Estimation (Mathematics), Item Response Theory, Test Format
Shen, Linjun – 1999
A multilevel approach was proposed for the assessment of differential item functioning and compared with the traditional logistic regression approach. Data from the Comprehensive Osteopathic Medical Licensing Examination for 2,300 freshman osteopathic medical students were analyzed. The multilevel approach used three-level hierarchical generalized…
Descriptors: Evaluation Methods, Higher Education, Item Bias, Medical Students
Leung, Chi-Keung; Chang, Hua-Hua; Hau, Kit-Tai – 2000
Information based item selection methods in computerized adaptive tests (CATs) tend to choose the item that provides maximum information at an examinee's estimated trait level. As a result, these methods can yield extremely skewed item exposure distributions in which items with high "a" values may be overexposed, while those with low…
Descriptors: Adaptive Testing, Computer Assisted Testing, Selection, Simulation
Shen, Linjun – 2000
This study assessed the effects of the type of medical curriculum on differential item functioning (DIF) and group differences at the test level in Level 1 of the Comprehensive Osteopathic Medical Licensing Examinations (COMLEX). The study also explored the relationship of the DIF and group differences at the test level. There are generally two…
Descriptors: Curriculum, Item Bias, Licensing Examinations (Professions), Medical Students
Deng, Hui; Chang, Hua-Hua – 2001
The purpose of this study was to compare a proposed revised a-stratified, or alpha-stratified, USTR method of test item selection with the original alpha-stratified multistage computerized adaptive testing approach (STR) and the use of maximum Fisher information (FSH) with respect to test efficiency and item pool usage using simulated computerized…
Descriptors: Adaptive Testing, Computer Assisted Testing, Item Banks, Selection
Chen, Shu-Ying; Ankenmann, Robert D.; Spray, Judith A. – 1999
This paper presents a derivation of an average between-test overlap index as a function of the item exposure index, for fixed-length computerized adaptive tests (CAT). This relationship is used to investigate the simultaneous control of item exposure at both the item and test levels. Implications for practice as well as future research are also…
Descriptors: Adaptive Testing, Computer Assisted Testing, Item Banks, Test Items
Michaelides, Michalis P.; Haertel, Edward H. – Center for Research on Evaluation Standards and Student Testing CRESST, 2004
There is variability in the estimation of an equating transformation because common-item parameters are obtained from responses of samples of examinees. The most commonly used standard error of equating quantifies this source of sampling error, which decreases as the sample size of examinees used to derive the transformation increases. In a…
Descriptors: Test Items, Testing, Error Patterns, Interrater Reliability
Peer reviewedOosterhof, Albert C. – Journal of Educational Measurement, 1976
The purpose of this study was to investigate the degree to which various selected test item discrimination indices reflect a common factor. The indices used include the point-biserial, biserial, phi and tetrachoric coefficients, Flanagan's approximation of the product-moment correlation, Gulliksen's item reliability index, and Findley's difference…
Descriptors: Comparative Analysis, Correlation, Factor Analysis, Mathematical Formulas
Peer reviewedHicks, Marilyn Maginley – Multivariate Behavioral Research, 1981
An empirical investigation of the statistical procedure entitled nonlinear principal components analysis was conducted on a known equation and on measurement data in order to demonstrate the procedure and examine its potential usefulness. This method was suggested by R. Gnanadesikan and based on an early paper of Karl Pearson. (Author/AL)
Descriptors: Correlation, Factor Analysis, Mastery Tests, Measurement Techniques
Peer reviewedVidler, Derek; Hansen, Richard – Journal of Experimental Education, 1980
Relationships among patterns of answer changing and item characteristics on multiple-choice tests are discussed. Results obtained were similar to those found in previous studies but pointed to further relationships among these variables. (Author/GK)
Descriptors: College Students, Difficulty Level, Higher Education, Multiple Choice Tests


