Publication Date
In 2025 | 0 |
Since 2024 | 0 |
Since 2021 (last 5 years) | 4 |
Since 2016 (last 10 years) | 10 |
Since 2006 (last 20 years) | 27 |
Descriptor
Sampling | 30 |
Computation | 17 |
Error of Measurement | 10 |
National Competency Tests | 10 |
Statistical Analysis | 10 |
Comparative Analysis | 9 |
Equated Scores | 8 |
Item Response Theory | 8 |
Sample Size | 8 |
Test Items | 8 |
Probability | 7 |
More ▼ |
Source
ETS Research Report Series | 30 |
Author
Qian, Jiahe | 8 |
Oranje, Andreas | 4 |
Haberman, Shelby J. | 3 |
Kim, Sooyeon | 3 |
Livingston, Samuel A. | 3 |
Braun, Henry | 2 |
Dorans, Neil J. | 2 |
Guo, Hongwen | 2 |
Haberman, Shelby | 2 |
Li, Shuhong | 2 |
Lu, Ru | 2 |
More ▼ |
Publication Type
Journal Articles | 30 |
Reports - Research | 30 |
Numerical/Quantitative Data | 1 |
Education Level
Elementary Education | 8 |
Secondary Education | 8 |
Junior High Schools | 7 |
Middle Schools | 7 |
Grade 8 | 6 |
Grade 4 | 4 |
Higher Education | 4 |
Intermediate Grades | 4 |
Postsecondary Education | 4 |
Audience
Location
Australia | 1 |
California | 1 |
Nevada | 1 |
New Jersey | 1 |
United States | 1 |
Laws, Policies, & Programs
No Child Left Behind Act 2001 | 1 |
Assessments and Surveys
National Assessment of… | 10 |
Graduate Record Examinations | 1 |
What Works Clearinghouse Rating
Donoghue, John R.; McClellan, Catherine A.; Hess, Melinda R. – ETS Research Report Series, 2022
When constructed-response items are administered for a second time, it is necessary to evaluate whether the current Time B administration's raters have drifted from the scoring of the original administration at Time A. To study this, Time A papers are sampled and rescored by Time B scorers. Commonly the scores are compared using the proportion of…
Descriptors: Item Response Theory, Test Construction, Scoring, Testing
Qian, Jiahe; Gu, Lixiong; Li, Shuhong – ETS Research Report Series, 2019
In assembling testlets (i.e., test forms) with a pool of new and used item blocks, test security is one of the main issues of concern. Strict constraints are often imposed on repeated usage of the same item blocks. Nevertheless, for an assessment administering multiple testlets, a goal is to select as large a sample of testlets as possible. In…
Descriptors: Test Construction, Sampling, Test Items, Mathematics
Kim, Sooyeon; Walker, Michael E. – ETS Research Report Series, 2021
Equating the scores from different forms of a test requires collecting data that link the forms. Problems arise when the test forms to be linked are given to groups that are not equivalent and the forms share no common items by which to measure or adjust for this group nonequivalence. We compared three approaches to adjusting for group…
Descriptors: Equated Scores, Weighted Scores, Sampling, Multiple Choice Tests
Qian, Jiahe; Li, Shuhong – ETS Research Report Series, 2021
In recent years, harmonic regression models have been applied to implement quality control for educational assessment data consisting of multiple administrations and displaying seasonality. As with other types of regression models, it is imperative that model adequacy checking and model fit be appropriately conducted. However, there has been no…
Descriptors: Models, Regression (Statistics), Language Tests, Quality Control
Yao, Lili; Haberman, Shelby; McCaffrey, Daniel F.; Lockwood, J. R. – ETS Research Report Series, 2020
Minimum discriminant information adjustment (MDIA), an approach to weighting samples to conform to known population information, provides a generalization of raking and poststratification. In the case of simple random sampling with replacement with uniform sampling weights, large-sample properties are available for MDIA estimates of population…
Descriptors: Discriminant Analysis, Sampling, Sample Size, Scores
Qian, Jiahe – ETS Research Report Series, 2020
The finite population correction (FPC) factor is often used to adjust variance estimators for survey data sampled from a finite population without replacement. As a replicated resampling approach, the jackknife approach is usually implemented without the FPC factor incorporated in its variance estimates. A paradigm is proposed to compare the…
Descriptors: Computation, Sampling, Data, Statistical Analysis
Lu, Ru; Guo, Hongwen; Dorans, Neil J. – ETS Research Report Series, 2021
Two families of analysis methods can be used for differential item functioning (DIF) analysis. One family is DIF analysis based on observed scores, such as the Mantel-Haenszel (MH) and the standardized proportion-correct metric for DIF procedures; the other is analysis based on latent ability, in which the statistic is a measure of departure from…
Descriptors: Robustness (Statistics), Weighted Scores, Test Items, Item Analysis
Jewsbury, Paul A. – ETS Research Report Series, 2019
When an assessment undergoes changes to the administration or instrument, bridge studies are typically used to try to ensure comparability of scores before and after the change. Among the most common and powerful is the common population linking design, with the use of a linear transformation to link scores to the metric of the original…
Descriptors: Evaluation Research, Scores, Error Patterns, Error of Measurement
Qian, Jiahe – ETS Research Report Series, 2017
The variance formula derived for a two-stage sampling design without replacement employs the joint inclusion probabilities in the first-stage selection of clusters. One of the difficulties encountered in data analysis is the lack of information about such joint inclusion probabilities. One way to solve this issue is by applying Hájek's…
Descriptors: Mathematical Formulas, Computation, Sampling, Research Design
Lu, Ru; Haberman, Shelby; Guo, Hongwen; Liu, Jinghua – ETS Research Report Series, 2015
In this study, we apply jackknifing to anchor items to evaluate the impact of anchor selection on equating stability. In an ideal world, the choice of anchor items should have little impact on equating results. When this ideal does not correspond to reality, selection of anchor items can strongly influence equating results. This influence does not…
Descriptors: Test Construction, Equated Scores, Test Items, Sampling
Puhan, Gautam – ETS Research Report Series, 2013
The purpose of this study was to demonstrate that the choice of sample weights when defining the target population under poststratification equating can be a critical factor in determining the accuracy of the equating results under a unique equating scenario, known as "rater comparability scoring and equating." The nature of data…
Descriptors: Scoring, Equated Scores, Sampling, Accuracy
Fife, James H.; James, Kofi; Peters, Stephanie – ETS Research Report Series, 2020
The concept of variability is central to statistics. In this research report, we review mathematics education research on variability and, based on that review and on feedback from an expert panel, propose a learning progression (LP) for variability. The structure of the proposed LP consists of 5 levels of sophistication in understanding…
Descriptors: Mathematics Education, Statistics Education, Feedback (Response), Research Reports
Zhang, Mo – ETS Research Report Series, 2013
Many testing programs use automated scoring to grade essays. One issue in automated essay scoring that has not been examined adequately is population invariance and its causes. The primary purpose of this study was to investigate the impact of sampling in model calibration on population invariance of automated scores. This study analyzed scores…
Descriptors: Automation, Scoring, Essay Tests, Sampling
Qian, Jiahe; Jiang, Yanming; von Davier, Alina A. – ETS Research Report Series, 2013
Several factors could cause variability in item response theory (IRT) linking and equating procedures, such as the variability across examinee samples and/or test items, seasonality, regional differences, native language diversity, gender, and other demographic variables. Hence, the following question arises: Is it possible to select optimal…
Descriptors: Item Response Theory, Test Items, Sampling, True Scores
Livingston, Samuel A.; Kim, Sooyeon – ETS Research Report Series, 2010
A series of resampling studies investigated the accuracy of equating by four different methods in a random groups equating design with samples of 400, 200, 100, and 50 test takers taking each form. Six pairs of forms were constructed. Each pair was constructed by assigning items from an existing test taken by 9,000 or more test takers. The…
Descriptors: Equated Scores, Accuracy, Sample Size, Sampling
Previous Page | Next Page »
Pages: 1 | 2