NotesFAQContact Us
Collection
Advanced
Search Tips
Showing all 13 results Save | Export
Peer reviewed Peer reviewed
Direct linkDirect link
Goodman, Joshua T.; Dallas, Andrew D.; Fan, Fen – Applied Measurement in Education, 2020
Recent research has suggested that re-setting the standard for each administration of a small sample examination, in addition to the high cost, does not adequately maintain similar performance expectations year after year. Small-sample equating methods have shown promise with samples between 20 and 30. For groups that have fewer than 20 students,…
Descriptors: Equated Scores, Sample Size, Sampling, Weighted Scores
Peer reviewed Peer reviewed
Direct linkDirect link
Kopp, Jason P.; Jones, Andrew T. – Applied Measurement in Education, 2020
Traditional psychometric guidelines suggest that at least several hundred respondents are needed to obtain accurate parameter estimates under the Rasch model. However, recent research indicates that Rasch equating results in accurate parameter estimates with sample sizes as small as 25. Item parameter drift under the Rasch model has been…
Descriptors: Item Response Theory, Psychometrics, Sample Size, Sampling
Peer reviewed Peer reviewed
Direct linkDirect link
Diao, Hongyu; Keller, Lisa – Applied Measurement in Education, 2020
Examinees who attempt the same test multiple times are often referred to as "repeaters." Previous studies suggested that repeaters should be excluded from the total sample before equating because repeater groups are distinguishable from non-repeater groups. In addition, repeaters might memorize anchor items, causing item drift under a…
Descriptors: Licensing Examinations (Professions), College Entrance Examinations, Repetition, Testing Problems
Peer reviewed Peer reviewed
Direct linkDirect link
Kannan, Priya; Sgammato, Adrienne; Tannenbaum, Richard J.; Katz, Irvin R. – Applied Measurement in Education, 2015
The Angoff method requires experts to view every item on the test and make a probability judgment. This can be time consuming when there are large numbers of items on the test. In this study, a G-theory framework was used to determine if a subset of items can be used to make generalizable cut-score recommendations. Angoff ratings (i.e.,…
Descriptors: Reliability, Standard Setting (Scoring), Cutting Scores, Test Items
Peer reviewed Peer reviewed
Direct linkDirect link
Steedle, Jeffrey T. – Applied Measurement in Education, 2014
Possible lack of motivation is a perpetual concern when tests have no stakes attached to performance. Specifically, the validity of test score interpretations may be compromised when examinees are unmotivated to exert their best efforts. Motivation filtering, a procedure that filters out apparently unmotivated examinees, was applied to the…
Descriptors: College Outcomes Assessment, Student Motivation, Sampling, Validity
Peer reviewed Peer reviewed
Direct linkDirect link
Rutkowski, Leslie – Applied Measurement in Education, 2014
Large-scale assessment programs such as the National Assessment of Educational Progress (NAEP), Trends in International Mathematics and Science Study (TIMSS), and Programme for International Student Assessment (PISA) use a sophisticated assessment administration design called matrix sampling that minimizes the testing burden on individual…
Descriptors: Measurement, Testing, Item Sampling, Computation
Peer reviewed Peer reviewed
Direct linkDirect link
Michaelides, Michalis P.; Haertel, Edward H. – Applied Measurement in Education, 2014
The standard error of equating quantifies the variability in the estimation of an equating function. Because common items for deriving equated scores are treated as fixed, the only source of variability typically considered arises from the estimation of common-item parameters from responses of samples of examinees. Use of alternative, equally…
Descriptors: Equated Scores, Test Items, Sampling, Statistical Inference
Peer reviewed Peer reviewed
Direct linkDirect link
Kim, Sooyeon; Walker, Michael – Applied Measurement in Education, 2012
This study examined the appropriateness of the anchor composition in a mixed-format test, which includes both multiple-choice (MC) and constructed-response (CR) items, using subpopulation invariance indices. Linking functions were derived in the nonequivalent groups with anchor test (NEAT) design using two types of anchor sets: (a) MC only and (b)…
Descriptors: Multiple Choice Tests, Test Format, Test Items, Equated Scores
Peer reviewed Peer reviewed
Direct linkDirect link
Phillips, Gary W. – Applied Measurement in Education, 2015
This article proposes that sampling design effects have potentially huge unrecognized impacts on the results reported by large-scale district and state assessments in the United States. When design effects are unrecognized and unaccounted for they lead to underestimating the sampling error in item and test statistics. Underestimating the sampling…
Descriptors: State Programs, Sampling, Research Design, Error of Measurement
Peer reviewed Peer reviewed
Eignor, Daniel R.; And Others – Applied Measurement in Education, 1990
Two independent replications of a sequence of simulations were conducted to aid in the diagnosis and interpretation of equating differences found between representative (random) and matched (nonrandom) samples for three commonly used conventional observed-score equating procedures and one item-response-theory-based equating procedure. (SLD)
Descriptors: Equated Scores, Item Response Theory, Sampling, Simulation
Peer reviewed Peer reviewed
Gao, Xiaohong; Brennan, Robert L. – Applied Measurement in Education, 2001
Studied the sampling variability of estimated variance components using data collected over several years for a listening and writing performance assessment and evaluated the stability of estimated measurement precision. Results indicate that the estimated variance components varied from one year to another and suggest that the measurement…
Descriptors: Estimation (Mathematics), Generalizability Theory, Listening Comprehension Tests, Performance Based Assessment
Peer reviewed Peer reviewed
Livingston, Samuel A.; And Others – Applied Measurement in Education, 1990
Combinations of five methods of equating test scores and two methods of selecting samples of students for equating were compared for accuracy, using data from the administration of the Scholastic Aptitude Test to more than 115,000 students. Implications for research and practice are discussed. (SLD)
Descriptors: College Entrance Examinations, Equated Scores, Evaluation Methods, High School Students
Peer reviewed Peer reviewed
Gao, Xiaohong; And Others – Applied Measurement in Education, 1994
This study provides empirical evidence about the sampling variability and generalizability (reliability) of a statewide performance assessment for grade six. Results for 600 students at individual and school levels indicate that task-sampling variability was the major source of measurement error. Rater-sampling variability was negligible. (SLD)
Descriptors: Achievement Tests, Educational Assessment, Elementary School Students, Error of Measurement