ERIC - Search Results

Publication Date

In 2026	0
Since 2025	0
Since 2022 (last 5 years)	0
Since 2017 (last 10 years)	2
Since 2007 (last 20 years)	14

Descriptor

Error of Measurement	20
Testing Programs	20
Equated Scores	7
Statistical Analysis	5
Comparative Analysis	4
Item Response Theory	4
Reliability	4
Sampling	4
Scores	4
State Programs	4
Academic Achievement	3
Estimation (Mathematics)	3
Item Analysis	3
Sample Size	3
Simulation	3
Student Evaluation	3
Test Reliability	3
Best Practices	2
College Entrance Examinations	2
Computation	2
Context Effect	2
Correlation	2
Decision Making	2
Educational Assessment	2
Evaluation Criteria	2
More ▼

Source

Applied Measurement in…	3
ETS Research Report Series	3
Journal of Educational and…	3
Educational and Psychological…	2
Psychometrika	2
Canadian Journal of School…	1
Educational Assessment	1
Educational Measurement:…	1
Educational Psychology	1
Gifted Child Quarterly	1
Journal of Educational…	1
Language Testing	1
More ▼

Publication Type

Journal Articles	20
Reports - Research	13
Reports - Evaluative	6
Reports - Descriptive	1
Speeches/Meeting Papers	1

Education Level

Higher Education	4
Elementary Secondary Education	3
Postsecondary Education	2
Adult Education	1
Elementary Education	1
Grade 2	1
Grade 3	1
Grade 8	1
Junior High Schools	1
Middle Schools	1
Secondary Education	1
More ▼

Audience

Location

Georgia

Laws, Policies, & Programs

Assessments and Surveys

ACT Assessment	1
Cognitive Abilities Test	1
Graduate Record Examinations	1
Iowa Tests of Basic Skills	1
Praxis Series	1
Program for International…	1
Stanford Achievement Tests	1
Test of English as a Foreign…	1
Trends in International…	1
Wechsler Intelligence Scale…	1

What Works Clearinghouse Rating

Showing 1 to 15 of 20 results Save | Export

Application of Best Linear Prediction and Penalized Best Linear Prediction to ETS Tests. Research Report. ETS RR-20-08

Peer reviewed
PDF on ERIC

Download full text

Haberman, Shelby J. – ETS Research Report Series, 2020

Best linear prediction (BLP) and penalized best linear prediction (PBLP) are techniques for combining sources of information to produce task scores, section scores, and composite test scores. The report examines issues to consider in operational implementation of BLP and PBLP in testing programs administered by ETS [Educational Testing Service].

Descriptors: Prediction, Scores, Tests, Testing Programs

Equating in Small-Scale Language Testing Programs

Peer reviewed

Direct link

LaFlair, Geoffrey T.; Isbell, Daniel; May, L. D. Nicolas; Gutierrez Arvizu, Maria Nelly; Jamieson, Joan – Language Testing, 2017

Language programs need multiple test forms for secure administrations and effective placement decisions, but can they have confidence that scores on alternate test forms have the same meaning? In large-scale testing programs, various equating methods are available to ensure the comparability of forms. The choice of equating method is informed by…

Descriptors: Language Tests, Equated Scores, Testing Programs, Comparative Analysis

Examining Big-Fish-Little-Pond-Effects across 49 Countries: A Multilevel Latent Variable Modelling Approach

Peer reviewed

Direct link

Wang, Ze – Educational Psychology, 2015

Using data from the Trends in International Mathematics and Science Study (TIMSS) 2007, this study examined the big-fish-little-pond-effects (BFLPEs) in 49 countries. In this study, the effect of math ability on math self-concept was decomposed into a within- and a between-level components using implicit mean centring and the complex data…

Descriptors: Nonverbal Ability, Mathematics, Self Concept, Hierarchical Linear Modeling

Demographically Adjusted Groups for Equating Test Scores. Research Report. ETS RR-14-30

Peer reviewed
PDF on ERIC

Download full text

Livingston, Samuel A. – ETS Research Report Series, 2014

In this study, I investigated 2 procedures intended to create test-taker groups of equal ability by poststratifying on a composite variable created from demographic information. In one procedure, the stratifying variable was the composite variable that best predicted the test score. In the other procedure, the stratifying variable was the…

Descriptors: Demography, Equated Scores, Cluster Grouping, Ability Grouping

Estimation of Contextual Effects through Nonlinear Multilevel Latent Variable Modeling with a Metropolis-Hastings Robbins-Monro Algorithm

Peer reviewed

Direct link

Yang, Ji Seung; Cai, Li – Journal of Educational and Behavioral Statistics, 2014

The main purpose of this study is to improve estimation efficiency in obtaining maximum marginal likelihood estimates of contextual effects in the framework of nonlinear multilevel latent variable model by adopting the Metropolis-Hastings Robbins-Monro algorithm (MH-RM). Results indicate that the MH-RM algorithm can produce estimates and standard…

Descriptors: Computation, Hierarchical Linear Modeling, Mathematics, Context Effect

Exploring Alternative Test Form Linking Designs with Modified Equating Sample Size and Anchor Test Length. Research Report. ETS RR-13-02

Peer reviewed
PDF on ERIC

Download full text

Wang, Lin; Qian, Jiahe; Lee, Yi-Hsuan – ETS Research Report Series, 2013

The purpose of this study was to evaluate the combined effects of reduced equating sample size and shortened anchor test length on item response theory (IRT)-based linking and equating results. Data from two independent operational forms of a large-scale testing program were used to establish the baseline results for evaluating the results from…

Descriptors: Test Construction, Item Response Theory, Testing Programs, Simulation

Combining Scores in Multiple-Criteria Assessment Systems: The Impact of Combination Rule

Peer reviewed

Direct link

McBee, Matthew T.; Peters, Scott J.; Waterman, Craig – Gifted Child Quarterly, 2014

Best practice in gifted and talented identification procedures involves making decisions on the basis of multiple measures. However, very little research has investigated the impact of different methods of combining multiple measures. This article examines the consequences of the conjunctive ("and"), disjunctive/complementary…

Descriptors: Best Practices, Ability Identification, Academically Gifted, Correlation

Impact of Design Effects in Large-Scale District and State Assessments

Peer reviewed

Direct link

Phillips, Gary W. – Applied Measurement in Education, 2015

This article proposes that sampling design effects have potentially huge unrecognized impacts on the results reported by large-scale district and state assessments in the United States. When design effects are unrecognized and unaccounted for they lead to underestimating the sampling error in item and test statistics. Underestimating the sampling…

Descriptors: State Programs, Sampling, Research Design, Error of Measurement

Accumulative Equating Error after a Chain of Linear Equatings

Peer reviewed

Direct link

Guo, Hongwen – Psychometrika, 2010

After many equatings have been conducted in a testing program, equating errors can accumulate to a degree that is not negligible compared to the standard error of measurement. In this paper, the author investigates the asymptotic accumulative standard error of equating (ASEE) for linear equating methods, including chained linear, Tucker, and…

Descriptors: Testing Programs, Testing, Error of Measurement, Equated Scores

Nonparametric Item Response Curve Estimation with Correction for Measurement Error

Peer reviewed

Direct link

Guo, Hongwen; Sinharay, Sandip – Journal of Educational and Behavioral Statistics, 2011

Nonparametric or kernel regression estimation of item response curves (IRCs) is often used in item analysis in testing programs. These estimates are biased when the observed scores are used as the regressor because the observed scores are contaminated by measurement error. Accuracy of this estimation is a concern theoretically and operationally.…

Descriptors: Testing Programs, Measurement, Item Analysis, Error of Measurement

Administration and Scoring Errors of Graduate Students Learning the WISC-IV: Issues and Controversies

Peer reviewed

Direct link

Mrazik, Martin; Janzen, Troy M.; Dombrowski, Stefan C.; Barford, Sean W.; Krawchuk, Lindsey L. – Canadian Journal of School Psychology, 2012

A total of 19 graduate students enrolled in a graduate course conducted 6 consecutive administrations of the Wechsler Intelligence Scale for Children, 4th edition (WISC-IV, Canadian version). Test protocols were examined to obtain data describing the frequency of examiner errors, including administration and scoring errors. Results identified 511…

Descriptors: Intelligence Tests, Intelligence, Statistical Analysis, Scoring

When Can Subscores Have Value?

Peer reviewed

Direct link

Haberman, Shelby J. – Journal of Educational and Behavioral Statistics, 2008

In educational tests, subscores are often generated from a portion of the items in a larger test. Guidelines based on mean squared error are proposed to indicate whether subscores are worth reporting. Alternatives considered are direct reports of subscores, estimates of subscores based on total score, combined estimates based on subscores and…

Descriptors: Testing Programs, Regression (Statistics), Scores, Student Evaluation

A Generalizability Theory Approach to Standard Error Estimates for Bookmark Standard Settings

Peer reviewed

Direct link

Lee, Guemin; Lewis, Daniel M. – Educational and Psychological Measurement, 2008

The bookmark standard-setting procedure is an item response theory-based method that is widely implemented in state testing programs. This study estimates standard errors for cut scores resulting from bookmark standard settings under a generalizability theory model and investigates the effects of different universes of generalization and error…

Descriptors: Generalizability Theory, Testing Programs, Error of Measurement, Cutting Scores

Measurement, Sampling, and Equating Errors in Large-Scale Assessments

Peer reviewed

Direct link

Wu, Margaret – Educational Measurement: Issues and Practice, 2010

In large-scale assessments, such as state-wide testing programs, national sample-based assessments, and international comparative studies, there are many steps involved in the measurement and reporting of student achievement. There are always sources of inaccuracies in each of the steps. It is of interest to identify the source and magnitude of…

Descriptors: Testing Programs, Educational Assessment, Measures (Individuals), Program Effectiveness

Bootstrap and Traditional Standard Errors of the Point-Biserial.

Peer reviewed

Harris, Deborah J.; Kolen, Michael J. – Educational and Psychological Measurement, 1988

Three methods of estimating point-biserial correlation coefficient standard errors were compared: (1) assuming normality; (2) not assuming normality; and (3) bootstrapping. Although errors estimated assuming normality were biased, such estimates were less variable and easier to compute, suggesting that this might be the method of choice in some…

Descriptors: Error of Measurement, Estimation (Mathematics), Item Analysis, Statistical Analysis

Previous Page | Next Page »

Pages: 1 | 2

Guo, Hongwen	2
Haberman, Shelby J.	2
Kolen, Michael J.	2
Almond, Patricia	1
Barford, Sean W.	1
Cai, Li	1
Chen, Wen-Hung	1
Dombrowski, Stefan C.	1
Ferrara, Steve	1
Gao, Xiaohong	1
Gutierrez Arvizu, Maria Nelly	1
Harris, Deborah J.	1
Hollenbeck, Keith	1
Isbell, Daniel	1
Jamieson, Joan	1
Janzen, Troy M.	1
Johnson, Eugene	1
Krawchuk, Lindsey L.	1
LaFlair, Geoffrey T.	1
Lee, Guemin	1
Lee, Yi-Hsuan	1
Lewis, Daniel M.	1
Livingston, Samuel A.	1
May, L. D. Nicolas	1
More ▼