Publication Date
| In 2026 | 0 |
| Since 2025 | 220 |
| Since 2022 (last 5 years) | 1089 |
| Since 2017 (last 10 years) | 2599 |
| Since 2007 (last 20 years) | 4960 |
Descriptor
Source
Author
Publication Type
Education Level
Audience
| Practitioners | 653 |
| Teachers | 563 |
| Researchers | 250 |
| Students | 201 |
| Administrators | 81 |
| Policymakers | 22 |
| Parents | 17 |
| Counselors | 8 |
| Community | 7 |
| Support Staff | 3 |
| Media Staff | 1 |
| More ▼ | |
Location
| Turkey | 226 |
| Canada | 223 |
| Australia | 155 |
| Germany | 116 |
| United States | 99 |
| China | 90 |
| Florida | 86 |
| Indonesia | 82 |
| Taiwan | 78 |
| United Kingdom | 73 |
| California | 66 |
| More ▼ | |
Laws, Policies, & Programs
Assessments and Surveys
What Works Clearinghouse Rating
| Meets WWC Standards without Reservations | 4 |
| Meets WWC Standards with or without Reservations | 4 |
| Does not meet standards | 1 |
Peer reviewedGerow, Joshua R. – Teaching of Psychology, 1980
Discusses a study to evaluate how test design influences student performance in elementary psychology courses. Findings indicated that the order in which test items appeared on an exam was less significant with regard to student performance than the extent to which test items were well-written and contained some measure of content validity.…
Descriptors: Academic Achievement, Difficulty Level, Higher Education, Psychology
Peer reviewedWilcox, Rand R. – Educational and Psychological Measurement, 1979
Wilcox has described three probability models which characterize a single test item in terms of a population of examinees (ED 156 718). This note indicates indicates that similar models can be derived which characterize a single examinee in terms of an item domain. A numerical illustration is given. (Author/JKS)
Descriptors: Achievement Tests, Item Analysis, Mathematical Models, Probability
Peer reviewedLi, Hsin-Hung; Stout, William – Psychometrika, 1996
A hypothesis testing and estimation procedure, Crossing SIBTEST, is presented for detecting crossing differential item functioning (DIF), which exists when the difference in probabilities of a correct answer for two examinee groups changes signs as ability level is varied. The procedure estimates the matching subtest score at which crossing…
Descriptors: Ability, Estimation (Mathematics), Hypothesis Testing, Item Bias
Peer reviewedOshima, T. C.; Raju, Nambury S. Rajo; Flowers, Claudia P. – Journal of Educational Measurement, 1997
Defines and demonstrates a framework for studying differential item functioning and differential test functioning for tests that are intended to be multidimensional. The procedure, which is illustrated with simulated data, is an extension of the unidimensional differential functioning of items and tests approach (N. Raju, W. van der Linden, and P.…
Descriptors: Item Bias, Item Response Theory, Models, Simulation
Peer reviewedZumbo, Bruno D.; Pope, Gregory A.; Watson, Jackie E.; Hubley, Anita M. – Educational and Psychological Measurement, 1997
E. Roskam's (1985) conjecture that steeper item characteristic curve (ICC) "a" parameters (slopes) (and higher item total correlations in classical test theory) would be found with more concretely worded test items was tested with results from 925 young adults on the Eysenck Personality Questionnaire (H. Eysenck and S. Eysenck, 1975).…
Descriptors: Correlation, Personality Assessment, Personality Measures, Test Interpretation
Peer reviewedGlas, Cees A. W.; van der Linden, Wim J. – Applied Psychological Measurement, 2003
Developed a multilevel item response (IRT) model that allows for differences between the distributions of item parameters of families of item clones. Results from simulation studies based on an item pool from the Law School Admission Test illustrate the accuracy of the item pool calibration and adaptive testing procedures based on the model. (SLD)
Descriptors: Adaptive Testing, Computer Assisted Testing, Item Banks, Item Response Theory
Peer reviewedYang, Chien-Lin; O'Neill, Thomas R.; Kramer, Gene A. – Journal of Applied Measurement, 2002
Studied item calibration stability in relation to response time and the levels of item difficulty between different response groups on a sample of 389 examinees responding to 6 subtest items of the Perceptual Ability Test of the Dental Admission Test. Results show that scores were equally useful for all groups, and different sources of item…
Descriptors: Ability, College Students, Dentistry, Difficulty Level
Peer reviewedWalter, Richard A.; Kapes, Jerome T. – Journal of Industrial Teacher Education, 2003
To identify a procedure for establishing cut scores for National Occupational Competency Testing Institute examinations in Pennsylvania, an expert panel assessed written and performance test items for minimally competent workers. Recommendations about the number, type, and training of judges used were made. (Contains 18 references.) (SK)
Descriptors: Cutting Scores, Interrater Reliability, Occupational Tests, Teacher Competency Testing
Peer reviewedWainer, Howard; Lukhele, Robert – Applied Measurement in Education, 1997
The screening for flaws done for multiple-choice items is often not done for large items. Examines continuous item weighting as a way to manage the influence of differential item functioning (DIF). Data from the College Board Advanced Placement History Test are used to illustrate the method. (SLD)
Descriptors: Advanced Placement, College Entrance Examinations, History, Item Bias
Peer reviewedDouglas, Jeffrey A.; And Others – Journal of Educational and Behavioral Statistics, 1996
A procedure for detection of differential item functioning (DIF) is proposed that amalgamates SIBTEST and kernel-smoothed item response function estimation to assess DIF as a function of the latent trait theta that the test is designed to measure. Smoothed SIBTEST is studied through simulation and real data analysis. (SLD)
Descriptors: Ability, Equations (Mathematics), Estimation (Mathematics), Item Bias
Peer reviewedNasser, Fadia; Takahashi, Tomone – Applied Measurement in Education, 2003
Examined the impact of using item parcels on ad hoc goodness-of-fit indexes in confirmatory factor analysis using the Arabic version of Sarason's Reactions to Tests scale. Data from 421 and 372 Arabic speaking students at an Israeli high school show that lower skewness and kurtosis and higher validity occur for parcels than for individual items.…
Descriptors: Arabic, Foreign Countries, Goodness of Fit, High School Students
Peer reviewedZumbo, Bruno D. – Language Testing, 2003
Based on the observation that scale-level methods are sometimes exclusively used to investigate measurement invariance for test translation, describes results of a simulation study investigating whether item-level differential item functioning (DIF) manifests itself in scale-level analyses such as single and multigroup factor analyses and per…
Descriptors: Factor Analysis, Item Analysis, Language Tests, Second Language Learning
Peer reviewedFrantom, Catherine; Green, Kathy E.; Lam, Tony C. M. – Journal of Applied Measurement, 2002
Studied the effects of item grouping on local independence and item invariance, the characteristics of items scaled under the Rasch model that make them sample-free. Data were 107 responses to a survey of teachers' opinions about the Ontario grade 9 literacy test. Although effects of grouping and item phrasing on invariance were found, results…
Descriptors: Attitude Measures, Attitudes, Foreign Countries, Groups
Peer reviewedSunathong, Surintorn; Schumacker, Randall E.; Beyerlein, Michael M. – Journal of Applied Measurement, 2000
Studied five factors that can affect the equating of scores from two tests onto a common score scale through the simulation and equating of 4,860 item data sets. Findings indicate three statistically significant two-way interactions for common item length and test length, item difficulty standard deviation and item distribution type, and item…
Descriptors: Difficulty Level, Equated Scores, Interaction, Item Response Theory
Peer reviewedBaker, Frank B. – Applied Psychological Measurement, 1990
The equating of results from the PC-BILOG computer program to an underlying metric was studied through simulation when a two-parameter item response theory model was used. Results are discussed in terms of the identification problem and implications for test equating. (SLD)
Descriptors: Bayesian Statistics, Computer Simulation, Equated Scores, Item Response Theory


