ERIC - Search Results

Publication Date

In 2025	0
Since 2024	0
Since 2021 (last 5 years)	4
Since 2016 (last 10 years)	10
Since 2006 (last 20 years)	27

Descriptor

Sampling	30
Computation	17
Error of Measurement	10
National Competency Tests	10
Statistical Analysis	10
Comparative Analysis	9
Equated Scores	8
Item Response Theory	8
Sample Size	8
Test Items	8
Probability	7
Grade 8	6
Accuracy	5
Weighted Scores	5
Grade 4	4
Reading Tests	4
Scores	4
Scoring	4
Test Construction	4
Correlation	3
Graphs	3
Item Analysis	3
Mathematics Tests	3
Methods	3
Statistical Bias	3
More ▼

Source

ETS Research Report Series

Publication Type

Journal Articles	30
Reports - Research	30
Numerical/Quantitative Data	1

Education Level

Elementary Education	8
Secondary Education	8
Junior High Schools	7
Middle Schools	7
Grade 8	6
Grade 4	4
Higher Education	4
Intermediate Grades	4
Postsecondary Education	4

Audience

Location

Australia	1
California	1
Nevada	1
New Jersey	1
United States	1

Laws, Policies, & Programs

No Child Left Behind Act 2001

Assessments and Surveys

National Assessment of…	10
Graduate Record Examinations	1

What Works Clearinghouse Rating

Showing 1 to 15 of 30 results Save | Export

Investigating Constructed-Response Scoring over Time: The Effects of Study Design on Trend Rescore Statistics. Research Report. ETS RR-22-15

Peer reviewed
PDF on ERIC

Download full text

Donoghue, John R.; McClellan, Catherine A.; Hess, Melinda R. – ETS Research Report Series, 2022

When constructed-response items are administered for a second time, it is necessary to evaluate whether the current Time B administration's raters have drifted from the scoring of the original administration at Time A. To study this, Time A papers are sampled and rescored by Time B scorers. Commonly the scores are compared using the proportion of…

Descriptors: Item Response Theory, Test Construction, Scoring, Testing

Applying Multiphase Sampling to Selecting Testlets with Constraints on Item Blocks. Research Report. ETS RR-19-03

Peer reviewed
PDF on ERIC

Download full text

Qian, Jiahe; Gu, Lixiong; Li, Shuhong – ETS Research Report Series, 2019

In assembling testlets (i.e., test forms) with a pool of new and used item blocks, test security is one of the main issues of concern. Strict constraints are often imposed on repeated usage of the same item blocks. Nevertheless, for an assessment administering multiple testlets, a goal is to select as large a sample of testlets as possible. In…

Descriptors: Test Construction, Sampling, Test Items, Mathematics

Comparisons among Approaches to Link Tests Using Random Samples Selected under Suboptimal Conditions. Research Report. ETS RR-21-14

Peer reviewed
PDF on ERIC

Download full text

Kim, Sooyeon; Walker, Michael E. – ETS Research Report Series, 2021

Equating the scores from different forms of a test requires collecting data that link the forms. Problems arise when the test forms to be linked are given to groups that are not equivalent and the forms share no common items by which to measure or adjust for this group nonequivalence. We compared three approaches to adjusting for group…

Descriptors: Equated Scores, Weighted Scores, Sampling, Multiple Choice Tests

Model Adequacy Checking for Applying Harmonic Regression to Assessment Quality Control. Research Report. ETS RR-21-13

Peer reviewed
PDF on ERIC

Download full text

Qian, Jiahe; Li, Shuhong – ETS Research Report Series, 2021

In recent years, harmonic regression models have been applied to implement quality control for educational assessment data consisting of multiple administrations and displaying seasonality. As with other types of regression models, it is imperative that model adequacy checking and model fit be appropriately conducted. However, there has been no…

Descriptors: Models, Regression (Statistics), Language Tests, Quality Control

Large-Sample Properties of Minimum Discriminant Information Adjustment Estimates under Complex Sampling Designs. Research Report. ETS RR-20-13

Peer reviewed
PDF on ERIC

Download full text

Yao, Lili; Haberman, Shelby; McCaffrey, Daniel F.; Lockwood, J. R. – ETS Research Report Series, 2020

Minimum discriminant information adjustment (MDIA), an approach to weighting samples to conform to known population information, provides a generalization of raking and poststratification. In the case of simple random sampling with replacement with uniform sampling weights, large-sample properties are available for MDIA estimates of population…

Descriptors: Discriminant Analysis, Sampling, Sample Size, Scores

Variance Estimation with Complex Data and Finite Population Correction--A Paradigm for Comparing Jackknife and Formula-Based Methods for Variance Estimation. Research Report. ETS RR-20-11

Peer reviewed
PDF on ERIC

Download full text

Qian, Jiahe – ETS Research Report Series, 2020

The finite population correction (FPC) factor is often used to adjust variance estimators for survey data sampled from a finite population without replacement. As a replicated resampling approach, the jackknife approach is usually implemented without the FPC factor incorporated in its variance estimates. A paradigm is proposed to compare the…

Descriptors: Computation, Sampling, Data, Statistical Analysis

Robustness of Weighted Differential Item Functioning (DIF) Analysis: The Case of Mantel-Haenszel DIF Statistics. Research Report. ETS RR-21-12

Peer reviewed
PDF on ERIC

Download full text

Lu, Ru; Guo, Hongwen; Dorans, Neil J. – ETS Research Report Series, 2021

Two families of analysis methods can be used for differential item functioning (DIF) analysis. One family is DIF analysis based on observed scores, such as the Mantel-Haenszel (MH) and the standardized proportion-correct metric for DIF procedures; the other is analysis based on latent ability, in which the statistic is a measure of departure from…

Descriptors: Robustness (Statistics), Weighted Scores, Test Items, Item Analysis

Error Variance in Common Population Linking Bridge Studies. Research Report. ETS RR-19-42

Peer reviewed
PDF on ERIC

Download full text

Jewsbury, Paul A. – ETS Research Report Series, 2019

When an assessment undergoes changes to the administration or instrument, bridge studies are typically used to try to ensure comparability of scores before and after the change. Among the most common and powerful is the common population linking design, with the use of a linear transformation to link scores to the metric of the original…

Descriptors: Evaluation Research, Scores, Error Patterns, Error of Measurement

Applying the Hájek Approach in Formula-Based Variance Estimation. Research Report. ETS RR-17-24

Peer reviewed
PDF on ERIC

Download full text

Qian, Jiahe – ETS Research Report Series, 2017

The variance formula derived for a two-stage sampling design without replacement employs the joint inclusion probabilities in the first-stage selection of clusters. One of the difficulties encountered in data analysis is the lack of information about such joint inclusion probabilities. One way to solve this issue is by applying Hájek's…

Descriptors: Mathematical Formulas, Computation, Sampling, Research Design

Use of Jackknifing to Evaluate Effects of Anchor Item Selection on Equating with the Nonequivalent Groups with Anchor Test (NEAT) Design. Research Report. ETS RR-15-10

Peer reviewed
PDF on ERIC

Download full text

Lu, Ru; Haberman, Shelby; Guo, Hongwen; Liu, Jinghua – ETS Research Report Series, 2015

In this study, we apply jackknifing to anchor items to evaluate the impact of anchor selection on equating stability. In an ideal world, the choice of anchor items should have little impact on equating results. When this ideal does not correspond to reality, selection of anchor items can strongly influence equating results. This influence does not…

Descriptors: Test Construction, Equated Scores, Test Items, Sampling

Choice of Target Population Weights in Rater Comparability Scoring and Equating. Research Report. ETS RR-13-03

Peer reviewed
PDF on ERIC

Download full text

Puhan, Gautam – ETS Research Report Series, 2013

The purpose of this study was to demonstrate that the choice of sample weights when defining the target population under poststratification equating can be a critical factor in determining the accuracy of the equating results under a unique equating scenario, known as "rater comparability scoring and equating." The nature of data…

Descriptors: Scoring, Equated Scores, Sampling, Accuracy

A Learning Progression for Variability. Research Report. ETS RR-20-05

Peer reviewed
PDF on ERIC

Download full text

Fife, James H.; James, Kofi; Peters, Stephanie – ETS Research Report Series, 2020

The concept of variability is central to statistics. In this research report, we review mathematics education research on variability and, based on that review and on feedback from an expert panel, propose a learning progression (LP) for variability. The structure of the proposed LP consists of 5 levels of sophistication in understanding…

Descriptors: Mathematics Education, Statistics Education, Feedback (Response), Research Reports

The Impact of Sampling Approach on Population Invariance in Automated Scoring of Essays. Research Report. ETS RR-13-18

Peer reviewed
PDF on ERIC

Download full text

Zhang, Mo – ETS Research Report Series, 2013

Many testing programs use automated scoring to grade essays. One issue in automated essay scoring that has not been examined adequately is population invariance and its causes. The primary purpose of this study was to investigate the impact of sampling in model calibration on population invariance of automated scores. This study analyzed scores…

Descriptors: Automation, Scoring, Essay Tests, Sampling

Weighting Test Samples in IRT Linking and Equating: Toward an Improved Sampling Design for Complex Equating. Research Report. ETS RR-13-39

Peer reviewed
PDF on ERIC

Download full text

Qian, Jiahe; Jiang, Yanming; von Davier, Alina A. – ETS Research Report Series, 2013

Several factors could cause variability in item response theory (IRT) linking and equating procedures, such as the variability across examinee samples and/or test items, seasonality, regional differences, native language diversity, gender, and other demographic variables. Hence, the following question arises: Is it possible to select optimal…

Descriptors: Item Response Theory, Test Items, Sampling, True Scores

An Empirical Comparison of Methods for Equating with Randomly Equivalent Groups of 50 to 400 Test Takers. Research Report. ETS RR-10-05

Peer reviewed
PDF on ERIC

Download full text

Livingston, Samuel A.; Kim, Sooyeon – ETS Research Report Series, 2010

A series of resampling studies investigated the accuracy of equating by four different methods in a random groups equating design with samples of 400, 200, 100, and 50 test takers taking each form. Six pairs of forms were constructed. Each pair was constructed by assigning items from an existing test taken by 9,000 or more test takers. The…

Descriptors: Equated Scores, Accuracy, Sample Size, Sampling

Previous Page | Next Page »

Pages: 1 | 2

Qian, Jiahe	8
Oranje, Andreas	4
Haberman, Shelby J.	3
Kim, Sooyeon	3
Livingston, Samuel A.	3
Braun, Henry	2
Dorans, Neil J.	2
Guo, Hongwen	2
Haberman, Shelby	2
Li, Shuhong	2
Lu, Ru	2
Puhan, Gautam	2
von Davier, Alina A.	2
Donoghue, John R.	1
Dorans, Neil	1
Fife, James H.	1
Freund, David	1
Gu, Lixiong	1
Hess, Melinda R.	1
James, Kofi	1
Jenkins, Frank	1
Jewsbury, Paul A.	1
Jiang, Yanming	1
Johnson, Matthew S.	1
Kandathil, Mathew	1
More ▼