Publication Date
| In 2026 | 0 |
| Since 2025 | 215 |
| Since 2022 (last 5 years) | 1084 |
| Since 2017 (last 10 years) | 2594 |
| Since 2007 (last 20 years) | 4955 |
Descriptor
Source
Author
Publication Type
Education Level
Audience
| Practitioners | 653 |
| Teachers | 563 |
| Researchers | 250 |
| Students | 201 |
| Administrators | 81 |
| Policymakers | 22 |
| Parents | 17 |
| Counselors | 8 |
| Community | 7 |
| Support Staff | 3 |
| Media Staff | 1 |
| More ▼ | |
Location
| Turkey | 226 |
| Canada | 223 |
| Australia | 155 |
| Germany | 116 |
| United States | 99 |
| China | 90 |
| Florida | 86 |
| Indonesia | 82 |
| Taiwan | 78 |
| United Kingdom | 73 |
| California | 66 |
| More ▼ | |
Laws, Policies, & Programs
Assessments and Surveys
What Works Clearinghouse Rating
| Meets WWC Standards without Reservations | 4 |
| Meets WWC Standards with or without Reservations | 4 |
| Does not meet standards | 1 |
Peer reviewedGlas, Cees A. W.; van der Linden, Wim J. – Applied Psychological Measurement, 2003
Developed a multilevel item response (IRT) model that allows for differences between the distributions of item parameters of families of item clones. Results from simulation studies based on an item pool from the Law School Admission Test illustrate the accuracy of the item pool calibration and adaptive testing procedures based on the model. (SLD)
Descriptors: Adaptive Testing, Computer Assisted Testing, Item Banks, Item Response Theory
Peer reviewedYang, Chien-Lin; O'Neill, Thomas R.; Kramer, Gene A. – Journal of Applied Measurement, 2002
Studied item calibration stability in relation to response time and the levels of item difficulty between different response groups on a sample of 389 examinees responding to 6 subtest items of the Perceptual Ability Test of the Dental Admission Test. Results show that scores were equally useful for all groups, and different sources of item…
Descriptors: Ability, College Students, Dentistry, Difficulty Level
Peer reviewedWalter, Richard A.; Kapes, Jerome T. – Journal of Industrial Teacher Education, 2003
To identify a procedure for establishing cut scores for National Occupational Competency Testing Institute examinations in Pennsylvania, an expert panel assessed written and performance test items for minimally competent workers. Recommendations about the number, type, and training of judges used were made. (Contains 18 references.) (SK)
Descriptors: Cutting Scores, Interrater Reliability, Occupational Tests, Teacher Competency Testing
Peer reviewedWainer, Howard; Lukhele, Robert – Applied Measurement in Education, 1997
The screening for flaws done for multiple-choice items is often not done for large items. Examines continuous item weighting as a way to manage the influence of differential item functioning (DIF). Data from the College Board Advanced Placement History Test are used to illustrate the method. (SLD)
Descriptors: Advanced Placement, College Entrance Examinations, History, Item Bias
Peer reviewedDouglas, Jeffrey A.; And Others – Journal of Educational and Behavioral Statistics, 1996
A procedure for detection of differential item functioning (DIF) is proposed that amalgamates SIBTEST and kernel-smoothed item response function estimation to assess DIF as a function of the latent trait theta that the test is designed to measure. Smoothed SIBTEST is studied through simulation and real data analysis. (SLD)
Descriptors: Ability, Equations (Mathematics), Estimation (Mathematics), Item Bias
Peer reviewedNasser, Fadia; Takahashi, Tomone – Applied Measurement in Education, 2003
Examined the impact of using item parcels on ad hoc goodness-of-fit indexes in confirmatory factor analysis using the Arabic version of Sarason's Reactions to Tests scale. Data from 421 and 372 Arabic speaking students at an Israeli high school show that lower skewness and kurtosis and higher validity occur for parcels than for individual items.…
Descriptors: Arabic, Foreign Countries, Goodness of Fit, High School Students
Peer reviewedZumbo, Bruno D. – Language Testing, 2003
Based on the observation that scale-level methods are sometimes exclusively used to investigate measurement invariance for test translation, describes results of a simulation study investigating whether item-level differential item functioning (DIF) manifests itself in scale-level analyses such as single and multigroup factor analyses and per…
Descriptors: Factor Analysis, Item Analysis, Language Tests, Second Language Learning
Peer reviewedFrantom, Catherine; Green, Kathy E.; Lam, Tony C. M. – Journal of Applied Measurement, 2002
Studied the effects of item grouping on local independence and item invariance, the characteristics of items scaled under the Rasch model that make them sample-free. Data were 107 responses to a survey of teachers' opinions about the Ontario grade 9 literacy test. Although effects of grouping and item phrasing on invariance were found, results…
Descriptors: Attitude Measures, Attitudes, Foreign Countries, Groups
Peer reviewedSunathong, Surintorn; Schumacker, Randall E.; Beyerlein, Michael M. – Journal of Applied Measurement, 2000
Studied five factors that can affect the equating of scores from two tests onto a common score scale through the simulation and equating of 4,860 item data sets. Findings indicate three statistically significant two-way interactions for common item length and test length, item difficulty standard deviation and item distribution type, and item…
Descriptors: Difficulty Level, Equated Scores, Interaction, Item Response Theory
Peer reviewedBaker, Frank B. – Applied Psychological Measurement, 1990
The equating of results from the PC-BILOG computer program to an underlying metric was studied through simulation when a two-parameter item response theory model was used. Results are discussed in terms of the identification problem and implications for test equating. (SLD)
Descriptors: Bayesian Statistics, Computer Simulation, Equated Scores, Item Response Theory
Peer reviewedGilmer, Jerry S. – Applied Psychological Measurement, 1989
The effects of test item disclosure on resulting examinee equated scores and population passing rates were studied for 5,000 examinees taking a professional licensing examination. Results suggest that the effects of disclosing depended on the nature of the released items. Specific effects on particular examinees are also discussed. (SLD)
Descriptors: Disclosure, Equated Scores, Licensing Examinations (Professions), Professional Education
Peer reviewedHaladyna, Thomas M.; Downing, Steven M. – Applied Measurement in Education, 1989
Results of 96 theoretical/empirical studies were reviewed to see if they support a taxonomy of 43 rules for writing multiple-choice test items. The taxonomy is the result of an analysis of 46 textbooks dealing with multiple-choice item writing. For nearly half of the rules, no research was found. (SLD)
Descriptors: Classification, Literature Reviews, Multiple Choice Tests, Test Construction
Peer reviewedWilson, Mark – Applied Psychological Measurement, 1988
A method for detecting and interpreting disturbances of the local-independence assumption among items that share common stimulus material or other features is presented. Dichotomous and polytomous Rasch models are used to analyze structure of the learning outcome superitems. (SLD)
Descriptors: Item Analysis, Latent Trait Theory, Mathematical Models, Test Interpretation
Peer reviewedReckase, Mark D.; And Others – Journal of Educational Measurement, 1988
It is demonstrated, theoretically and empirically, that item sets can be selected that meet the unidimensionality assumption of most item response theory models, even though they require more than one ability for a correct response. A method for identifying such item sets for test development purposes is presented. (SLD)
Descriptors: Computer Simulation, Item Analysis, Latent Trait Theory, Mathematical Models
Peer reviewedSchnipke, Deborah L.; Green, Bert F. – Journal of Educational Measurement, 1995
Two item selection algorithms, one based on maximal differentiation between examinees and one based on item response theory and maximum information for each examinee, were compared in simulated linear and adaptive tests of cognitive ability. Adaptive tests based on maximum information were clearly superior. (SLD)
Descriptors: Adaptive Testing, Algorithms, Comparative Analysis, Item Response Theory


