ERIC - Search Results

Publication Date

In 2025	0
Since 2024	0
Since 2021 (last 5 years)	2
Since 2016 (last 10 years)	4
Since 2006 (last 20 years)	11

Descriptor

Error of Measurement	11
Sampling	10
Computation	6
National Competency Tests	5
Sample Size	4
Equated Scores	3
Grade 8	3
Statistical Analysis	3
Weighted Scores	3
Accuracy	2
Comparative Analysis	2
Data Collection	2
Grade 4	2
Item Response Theory	2
Models	2
Probability	2
Regression (Statistics)	2
Statistical Bias	2
Test Construction	2
Test Items	2
Test Length	2
Ability	1
Data	1
Data Analysis	1
Difficulty Level	1
More ▼

Source

ETS Research Report Series

Publication Type

Journal Articles	11
Reports - Research	11

Education Level

Elementary Education	3
Grade 8	3
Junior High Schools	3
Middle Schools	3
Secondary Education	3
Grade 4	2
Intermediate Grades	2
Higher Education	1
Postsecondary Education	1

Audience

Location

California	1
Nevada	1
New Jersey	1

Laws, Policies, & Programs

No Child Left Behind Act 2001

Assessments and Surveys

National Assessment of…

What Works Clearinghouse Rating

Showing all 11 results Save | Export

Model Adequacy Checking for Applying Harmonic Regression to Assessment Quality Control. Research Report. ETS RR-21-13

Peer reviewed
PDF on ERIC

Download full text

Qian, Jiahe; Li, Shuhong – ETS Research Report Series, 2021

In recent years, harmonic regression models have been applied to implement quality control for educational assessment data consisting of multiple administrations and displaying seasonality. As with other types of regression models, it is imperative that model adequacy checking and model fit be appropriately conducted. However, there has been no…

Descriptors: Models, Regression (Statistics), Language Tests, Quality Control

Variance Estimation with Complex Data and Finite Population Correction--A Paradigm for Comparing Jackknife and Formula-Based Methods for Variance Estimation. Research Report. ETS RR-20-11

Peer reviewed
PDF on ERIC

Download full text

Qian, Jiahe – ETS Research Report Series, 2020

The finite population correction (FPC) factor is often used to adjust variance estimators for survey data sampled from a finite population without replacement. As a replicated resampling approach, the jackknife approach is usually implemented without the FPC factor incorporated in its variance estimates. A paradigm is proposed to compare the…

Descriptors: Computation, Sampling, Data, Statistical Analysis

Robustness of Weighted Differential Item Functioning (DIF) Analysis: The Case of Mantel-Haenszel DIF Statistics. Research Report. ETS RR-21-12

Peer reviewed
PDF on ERIC

Download full text

Lu, Ru; Guo, Hongwen; Dorans, Neil J. – ETS Research Report Series, 2021

Two families of analysis methods can be used for differential item functioning (DIF) analysis. One family is DIF analysis based on observed scores, such as the Mantel-Haenszel (MH) and the standardized proportion-correct metric for DIF procedures; the other is analysis based on latent ability, in which the statistic is a measure of departure from…

Descriptors: Robustness (Statistics), Weighted Scores, Test Items, Item Analysis

Error Variance in Common Population Linking Bridge Studies. Research Report. ETS RR-19-42

Peer reviewed
PDF on ERIC

Download full text

Jewsbury, Paul A. – ETS Research Report Series, 2019

When an assessment undergoes changes to the administration or instrument, bridge studies are typically used to try to ensure comparability of scores before and after the change. Among the most common and powerful is the common population linking design, with the use of a linear transformation to link scores to the metric of the original…

Descriptors: Evaluation Research, Scores, Error Patterns, Error of Measurement

Evaluation of Methods to Compute Complex Sample Standard Errors in Latent Regression Models. Research Report. ETS RR-09-49

Peer reviewed
PDF on ERIC

Download full text

Oranje, Andreas; Li, Deping; Kandathil, Mathew – ETS Research Report Series, 2009

Several complex sample standard error estimators based on linearization and resampling for the latent regression model of the National Assessment of Educational Progress (NAEP) are studied with respect to design choices such as number of items, number of regressors, and the efficiency of the sample. This paper provides an evaluation of the extent…

Descriptors: Error of Measurement, Computation, Regression (Statistics), National Competency Tests

Methods of Linking with Small Samples in a Common-Item Design: An Empirical Comparison. Research Report. ETS RR-09-38

Peer reviewed
PDF on ERIC

Download full text

Kim, Sooyeon; Livingston, Samuel A. – ETS Research Report Series, 2009

A series of resampling studies was conducted to compare the accuracy of equating in a common item design using four different methods: chained equipercentile equating of smoothed distributions, chained linear equating, chained mean equating, and the circle-arc method. Four operational test forms, each containing more than 100 items, were used for…

Descriptors: Sampling, Sample Size, Accuracy, Test Items

Theoretical and Empirical Standard Errors for Two Population Invariance Measures in the Linear Equating Case. Research Report. ETS RR-08-24

Peer reviewed
PDF on ERIC

Download full text

von Davier, Alina A.; Manalo, Jonathan R.; Rijmen, Frank – ETS Research Report Series, 2008

The standard errors of the 2 most widely used population-invariance measures of equating functions, root mean square difference (RMSD) and root expected mean square difference (REMSD), are not derived for common equating methods such as linear equating. Consequently, it is unknown how much noise is contained in these estimates. This paper…

Descriptors: Equated Scores, Error of Measurement, Statistical Analysis, Sampling

An Alternative Data Collection Design for Equating with Very Small Samples. Research Report. ETS RR-08-11

Peer reviewed
PDF on ERIC

Download full text

Puhan, Gautam; Moses, Tim; Grant, Mary; McHale, Fred – ETS Research Report Series, 2008

A single group (SG) equating design with nearly equivalent test forms (SiGNET) design was developed by Grant (2006) to equate small volume tests. The basis of this design is that examinees take two largely overlapping test forms within a single administration. The scored items for the operational form are divided into mini-tests called testlets.…

Descriptors: Data Collection, Equated Scores, Item Sampling, Sample Size

Mapping State Standards to the NAEP Scale. Research Report. ETS RR-08-57

Peer reviewed
PDF on ERIC

Download full text

Braun, Henry; Qian, Jiahe – ETS Research Report Series, 2008

This report describes the derivation and evaluation of a method for comparing the performance standards for public school students set by different states. It is based on an approach proposed by McLaughlin and associates, which constituted an innovative attempt to resolve the confusion and concern that occurs when very different proportions of…

Descriptors: State Standards, Comparative Analysis, Public Schools, National Competency Tests

Disclosure Risk in Educational Surveys: An Application to the National Assessment of Educational Progress. Research Report. ETS RR-07-24

Peer reviewed
PDF on ERIC

Download full text

Oranje, Andreas; Freund, David; Lin, Mei-jang; Tang, Yuxin – ETS Research Report Series, 2007

In this paper, a data perturbation method for minimizing the possibility of disclosure of participants' identities on a survey is described in the context of the National Assessment of Educational Progress (NAEP). The method distinguishes itself from most approaches because of the presence of cognitive tasks. Hence, a data edit should have minimal…

Descriptors: Student Surveys, Risk, National Competency Tests, Data Analysis

Weighting Procedures and the Cluster Forming Algorithm for Delete-k Jackknife Variance Estimation for Institutional Surveys. Research Report. ETS RR-06-15

Peer reviewed
PDF on ERIC

Download full text

Qian, Jiahe – ETS Research Report Series, 2006

Weighting and variance estimation are two statistical issues involved in survey data analysis for large-scale assessment programs such as the Higher Education Information and Communication Technology (ICT) Literacy Assessment. Because survey data are always acquired by probability sampling, to draw unbiased or almost unbiased inferences for the…

Descriptors: Weighted Scores, Sampling, Statistical Analysis, Higher Education

Qian, Jiahe	4
Oranje, Andreas	2
Braun, Henry	1
Dorans, Neil J.	1
Freund, David	1
Grant, Mary	1
Guo, Hongwen	1
Jewsbury, Paul A.	1
Kandathil, Mathew	1
Kim, Sooyeon	1
Li, Deping	1
Li, Shuhong	1
Lin, Mei-jang	1
Livingston, Samuel A.	1
Lu, Ru	1
Manalo, Jonathan R.	1
McHale, Fred	1
Moses, Tim	1
Puhan, Gautam	1
Rijmen, Frank	1
Tang, Yuxin	1
von Davier, Alina A.	1
More ▼