ERIC - Search Results

Publication Date

In 2026	0
Since 2025	0
Since 2022 (last 5 years)	2
Since 2017 (last 10 years)	4
Since 2007 (last 20 years)	17

Descriptor

Error of Measurement	19
Probability	19
Sample Size	19
Statistical Analysis	10
Sampling	7
Effect Size	6
Comparative Analysis	4
Equated Scores	4
Item Response Theory	4
Simulation	4
Educational Research	3
Evaluation Methods	3
Item Analysis	3
Meta Analysis	3
Models	3
Regression (Statistics)	3
Statistical Bias	3
Statistics	3
Test Items	3
Classification	2
Computation	2
Correlation	2
Data	2
Data Analysis	2
Error Correction	2
More ▼

Source

Educational and Psychological…	3
Journal of Educational…	2
Practical Assessment,…	2
Psychometrika	2
Applied Measurement in…	1
ETS Research Report Series	1
Gifted Child Quarterly	1
Journal of Experimental…	1
Language Teaching Research	1
MathAMATYC Educator	1
Research Synthesis Methods	1
Society for Research on…	1
Teaching Statistics: An…	1
More ▼

Publication Type

Journal Articles	17
Reports - Research	12
Reports - Evaluative	4
Reports - Descriptive	3
Information Analyses	1
Speeches/Meeting Papers	1

Education Level

Elementary Education	1
Grade 8	1
Higher Education	1
Junior High Schools	1
Middle Schools	1
Secondary Education	1

Audience

Teachers

Location

Laws, Policies, & Programs

Assessments and Surveys

National Assessment of…

What Works Clearinghouse Rating

Showing 1 to 15 of 19 results Save | Export

How to Obtain the Most Error-Free Estimate of Reliability? Eight Sources of Deflation in the Estimates of Reliability to Avoid

Peer reviewed
PDF on ERIC

Download full text

Metsämuuronen, Jari – Practical Assessment, Research & Evaluation, 2022

The reliability of a test score is usually underestimated and the deflation may be profound, 0.40 - 0.60 units of reliability or 46 - 71%. Eight root sources of the deflation are discussed and quantified by a simulation with 1,440 real-world datasets: (1) errors in the measurement modelling, (2) inefficiency in the estimator of reliability within…

Descriptors: Test Reliability, Scores, Test Items, Correlation

Estimation of Heterogeneity Variance Based on a Generalized "Q" Statistic in Meta-Analysis of Log-Odds-Ratio

Peer reviewed

Direct link

Kulinskaya, Elena; Hoaglin, David C. – Research Synthesis Methods, 2023

For estimation of heterogeneity variance T[superscript 2] in meta-analysis of log-odds-ratio, we derive new mean- and median-unbiased point estimators and new interval estimators based on a generalized Q statistic, Q[subscript F], in which the weights depend on only the studies' effective sample sizes. We compare them with familiar estimators…

Descriptors: Q Methodology, Statistical Analysis, Meta Analysis, Intervals

Variance Estimation with Complex Data and Finite Population Correction--A Paradigm for Comparing Jackknife and Formula-Based Methods for Variance Estimation. Research Report. ETS RR-20-11

Peer reviewed
PDF on ERIC

Download full text

Qian, Jiahe – ETS Research Report Series, 2020

The finite population correction (FPC) factor is often used to adjust variance estimators for survey data sampled from a finite population without replacement. As a replicated resampling approach, the jackknife approach is usually implemented without the FPC factor incorporated in its variance estimates. A paradigm is proposed to compare the…

Descriptors: Computation, Sampling, Data, Statistical Analysis

Biased Sampling Activity: An Investigation to Promote Discussion

Peer reviewed

Direct link

White, Simon R.; Bonnett, Laura J. – Teaching Statistics: An International Journal for Teachers, 2019

The statistical concept of sampling is often given little direct attention, typically reduced to the mantra "take a random sample". This low resource and adaptable activity demonstrates sampling and explores issues that arise due to biased sampling.

Descriptors: Statistical Bias, Sampling, Statistical Analysis, Learning Activities

Measurement Error and Equating Error in Power Analysis

Peer reviewed
PDF on ERIC

Download full text

Phillips, Gary W.; Jiang, Tao – Practical Assessment, Research & Evaluation, 2016

Power analysis is a fundamental prerequisite for conducting scientific research. Without power analysis the researcher has no way of knowing whether the sample size is large enough to detect the effect he or she is looking for. This paper demonstrates how psychometric factors such as measurement error and equating error affect the power of…

Descriptors: Error of Measurement, Statistical Analysis, Equated Scores, Sample Size

Asymptotic Standard Errors of Observed-Score Equating with Polytomous IRT Models

Peer reviewed

Direct link

Andersson, Björn – Journal of Educational Measurement, 2016

In observed-score equipercentile equating, the goal is to make scores on two scales or tests measuring the same construct comparable by matching the percentiles of the respective score distributions. If the tests consist of different items with multiple categories for each item, a suitable model for the responses is a polytomous item response…

Descriptors: Equated Scores, Item Response Theory, Error of Measurement, Tests

Comparisons of Improvement-Over-Chance Effect Sizes for Two Groups under Variance Heterogeneity and Prior Probabilities

Peer reviewed

Direct link

Henson, Robin K.; Natesan, Prathiba; Axelson, Erika D. – Journal of Experimental Education, 2014

The authors examined the distributional properties of 3 improvement-over-chance, I, effect sizes each derived from linear and quadratic predictive discriminant analysis and from logistic regression analysis for the 2-group univariate classification. These 3 classification methods (3 levels) were studied under varying levels of data conditions,…

Descriptors: Effect Size, Probability, Comparative Analysis, Classification

Synthesizing Results from Replication Studies Using Robust Variance Estimation: Corrections When the Number of Studies Is Small

Peer reviewed
PDF on ERIC

Download full text

Tipton, Elizabeth – Society for Research on Educational Effectiveness, 2014

Replication studies allow for making comparisons and generalizations regarding the effectiveness of an intervention across different populations, versions of a treatment, settings and contexts, and outcomes. One method for making these comparisons across many replication studies is through the use of meta-analysis. A recent innovation in…

Descriptors: Replication (Evaluation), Robustness (Statistics), Meta Analysis, Regression (Statistics)

Inferential Statistics in "Language Teaching Research": A Review and Ways Forward

Peer reviewed

Direct link

Lindstromberg, Seth – Language Teaching Research, 2016

This article reviews all (quasi)experimental studies appearing in the first 19 volumes (1997-2015) of "Language Teaching Research" (LTR). Specifically, it provides an overview of how statistical analyses were conducted in these studies and of how the analyses were reported. The overall conclusion is that there has been a tight adherence…

Descriptors: Meta Analysis, Second Language Learning, Second Language Instruction, Guidelines

Standard Error of Linear Observed-Score Equating for the NEAT Design with Nonnormally Distributed Data

Peer reviewed

Direct link

Zu, Jiyun; Yuan, Ke-Hai – Journal of Educational Measurement, 2012

In the nonequivalent groups with anchor test (NEAT) design, the standard error of linear observed-score equating is commonly estimated by an estimator derived assuming multivariate normality. However, real data are seldom normally distributed, causing this normal estimator to be inconsistent. A general estimator, which does not rely on the…

Descriptors: Sample Size, Equated Scores, Test Items, Error of Measurement

Evaluation of Two Types of Differential Item Functioning in Factor Mixture Models with Binary Outcomes

Peer reviewed

Direct link

Lee, HwaYoung; Beretvas, S. Natasha – Educational and Psychological Measurement, 2014

Conventional differential item functioning (DIF) detection methods (e.g., the Mantel-Haenszel test) can be used to detect DIF only across observed groups, such as gender or ethnicity. However, research has found that DIF is not typically fully explained by an observed variable. True sources of DIF may include unobserved, latent variables, such as…

Descriptors: Item Analysis, Factor Structure, Bayesian Statistics, Goodness of Fit

Impact of Design Effects in Large-Scale District and State Assessments

Peer reviewed

Direct link

Phillips, Gary W. – Applied Measurement in Education, 2015

This article proposes that sampling design effects have potentially huge unrecognized impacts on the results reported by large-scale district and state assessments in the United States. When design effects are unrecognized and unaccounted for they lead to underestimating the sampling error in item and test statistics. Underestimating the sampling…

Descriptors: State Programs, Sampling, Research Design, Error of Measurement

Sample Size Determination for Rasch Model Tests

Peer reviewed

Direct link

Draxler, Clemens – Psychometrika, 2010

This paper is concerned with supplementing statistical tests for the Rasch model so that additionally to the probability of the error of the first kind (Type I probability) the probability of the error of the second kind (Type II probability) can be controlled at a predetermined level by basing the test on the appropriate number of observations.…

Descriptors: Statistical Analysis, Probability, Sample Size, Error of Measurement

How Large Should a Statistical Sample Be?

Peer reviewed

Direct link

Menil, Violeta C.; Ye, Ruili – MathAMATYC Educator, 2012

This study serves as a teaching aid for teachers of introductory statistics. The aim of this study was limited to determining various sample sizes when estimating population proportion. Tables on sample sizes were generated using a C[superscript ++] program, which depends on population size, degree of precision or error level, and confidence…

Descriptors: Sample Size, Probability, Statistics, Sampling

Assessing Goodness of Fit in Item Response Theory with Nonparametric Models: A Comparison of Posterior Probabilities and Kernel-Smoothing Approaches

Peer reviewed

Direct link

Sueiro, Manuel J.; Abad, Francisco J. – Educational and Psychological Measurement, 2011

The distance between nonparametric and parametric item characteristic curves has been proposed as an index of goodness of fit in item response theory in the form of a root integrated squared error index. This article proposes to use the posterior distribution of the latent trait as the nonparametric model and compares the performance of an index…

Descriptors: Goodness of Fit, Item Response Theory, Nonparametric Statistics, Probability

Previous Page | Next Page »

Pages: 1 | 2

Phillips, Gary W.	2
Abad, Francisco J.	1
Andersson, Björn	1
Axelson, Erika D.	1
Barr, James	1
Beretvas, S. Natasha	1
Bonnett, Laura J.	1
Draxler, Clemens	1
Fan, Xitao	1
Haberman, Shelby J.	1
Henson, Robin K.	1
Hoaglin, David C.	1
Jiang, Tao	1
Kulinskaya, Elena	1
Lee, HwaYoung	1
Lindstromberg, Seth	1
Liu, Chih-Yu	1
Menil, Violeta C.	1
Metsämuuronen, Jari	1
Natesan, Prathiba	1
Nowell, Dana L.	1
Qian, Jiahe	1
Rasor, Richard E.	1
Sueiro, Manuel J.	1
Tipton, Elizabeth	1
More ▼