Publication Date
In 2025 | 0 |
Since 2024 | 0 |
Since 2021 (last 5 years) | 0 |
Since 2016 (last 10 years) | 2 |
Since 2006 (last 20 years) | 14 |
Descriptor
Error of Measurement | 20 |
Testing Programs | 20 |
Equated Scores | 7 |
Statistical Analysis | 5 |
Comparative Analysis | 4 |
Item Response Theory | 4 |
Reliability | 4 |
Sampling | 4 |
Scores | 4 |
State Programs | 4 |
Academic Achievement | 3 |
More ▼ |
Source
Author
Guo, Hongwen | 2 |
Haberman, Shelby J. | 2 |
Kolen, Michael J. | 2 |
Almond, Patricia | 1 |
Barford, Sean W. | 1 |
Cai, Li | 1 |
Chen, Wen-Hung | 1 |
Dombrowski, Stefan C. | 1 |
Ferrara, Steve | 1 |
Gao, Xiaohong | 1 |
Gutierrez Arvizu, Maria Nelly | 1 |
More ▼ |
Publication Type
Journal Articles | 20 |
Reports - Research | 13 |
Reports - Evaluative | 6 |
Reports - Descriptive | 1 |
Speeches/Meeting Papers | 1 |
Education Level
Higher Education | 4 |
Elementary Secondary Education | 3 |
Postsecondary Education | 2 |
Adult Education | 1 |
Elementary Education | 1 |
Grade 2 | 1 |
Grade 3 | 1 |
Grade 8 | 1 |
Junior High Schools | 1 |
Middle Schools | 1 |
Secondary Education | 1 |
More ▼ |
Audience
Location
Georgia | 1 |
Laws, Policies, & Programs
Assessments and Surveys
What Works Clearinghouse Rating
Haberman, Shelby J. – ETS Research Report Series, 2020
Best linear prediction (BLP) and penalized best linear prediction (PBLP) are techniques for combining sources of information to produce task scores, section scores, and composite test scores. The report examines issues to consider in operational implementation of BLP and PBLP in testing programs administered by ETS [Educational Testing Service].
Descriptors: Prediction, Scores, Tests, Testing Programs
LaFlair, Geoffrey T.; Isbell, Daniel; May, L. D. Nicolas; Gutierrez Arvizu, Maria Nelly; Jamieson, Joan – Language Testing, 2017
Language programs need multiple test forms for secure administrations and effective placement decisions, but can they have confidence that scores on alternate test forms have the same meaning? In large-scale testing programs, various equating methods are available to ensure the comparability of forms. The choice of equating method is informed by…
Descriptors: Language Tests, Equated Scores, Testing Programs, Comparative Analysis
Wang, Ze – Educational Psychology, 2015
Using data from the Trends in International Mathematics and Science Study (TIMSS) 2007, this study examined the big-fish-little-pond-effects (BFLPEs) in 49 countries. In this study, the effect of math ability on math self-concept was decomposed into a within- and a between-level components using implicit mean centring and the complex data…
Descriptors: Nonverbal Ability, Mathematics, Self Concept, Hierarchical Linear Modeling
Livingston, Samuel A. – ETS Research Report Series, 2014
In this study, I investigated 2 procedures intended to create test-taker groups of equal ability by poststratifying on a composite variable created from demographic information. In one procedure, the stratifying variable was the composite variable that best predicted the test score. In the other procedure, the stratifying variable was the…
Descriptors: Demography, Equated Scores, Cluster Grouping, Ability Grouping
Yang, Ji Seung; Cai, Li – Journal of Educational and Behavioral Statistics, 2014
The main purpose of this study is to improve estimation efficiency in obtaining maximum marginal likelihood estimates of contextual effects in the framework of nonlinear multilevel latent variable model by adopting the Metropolis-Hastings Robbins-Monro algorithm (MH-RM). Results indicate that the MH-RM algorithm can produce estimates and standard…
Descriptors: Computation, Hierarchical Linear Modeling, Mathematics, Context Effect
Wang, Lin; Qian, Jiahe; Lee, Yi-Hsuan – ETS Research Report Series, 2013
The purpose of this study was to evaluate the combined effects of reduced equating sample size and shortened anchor test length on item response theory (IRT)-based linking and equating results. Data from two independent operational forms of a large-scale testing program were used to establish the baseline results for evaluating the results from…
Descriptors: Test Construction, Item Response Theory, Testing Programs, Simulation
McBee, Matthew T.; Peters, Scott J.; Waterman, Craig – Gifted Child Quarterly, 2014
Best practice in gifted and talented identification procedures involves making decisions on the basis of multiple measures. However, very little research has investigated the impact of different methods of combining multiple measures. This article examines the consequences of the conjunctive ("and"), disjunctive/complementary…
Descriptors: Best Practices, Ability Identification, Academically Gifted, Correlation
Phillips, Gary W. – Applied Measurement in Education, 2015
This article proposes that sampling design effects have potentially huge unrecognized impacts on the results reported by large-scale district and state assessments in the United States. When design effects are unrecognized and unaccounted for they lead to underestimating the sampling error in item and test statistics. Underestimating the sampling…
Descriptors: State Programs, Sampling, Research Design, Error of Measurement
Guo, Hongwen – Psychometrika, 2010
After many equatings have been conducted in a testing program, equating errors can accumulate to a degree that is not negligible compared to the standard error of measurement. In this paper, the author investigates the asymptotic accumulative standard error of equating (ASEE) for linear equating methods, including chained linear, Tucker, and…
Descriptors: Testing Programs, Testing, Error of Measurement, Equated Scores
Guo, Hongwen; Sinharay, Sandip – Journal of Educational and Behavioral Statistics, 2011
Nonparametric or kernel regression estimation of item response curves (IRCs) is often used in item analysis in testing programs. These estimates are biased when the observed scores are used as the regressor because the observed scores are contaminated by measurement error. Accuracy of this estimation is a concern theoretically and operationally.…
Descriptors: Testing Programs, Measurement, Item Analysis, Error of Measurement
Mrazik, Martin; Janzen, Troy M.; Dombrowski, Stefan C.; Barford, Sean W.; Krawchuk, Lindsey L. – Canadian Journal of School Psychology, 2012
A total of 19 graduate students enrolled in a graduate course conducted 6 consecutive administrations of the Wechsler Intelligence Scale for Children, 4th edition (WISC-IV, Canadian version). Test protocols were examined to obtain data describing the frequency of examiner errors, including administration and scoring errors. Results identified 511…
Descriptors: Intelligence Tests, Intelligence, Statistical Analysis, Scoring
Haberman, Shelby J. – Journal of Educational and Behavioral Statistics, 2008
In educational tests, subscores are often generated from a portion of the items in a larger test. Guidelines based on mean squared error are proposed to indicate whether subscores are worth reporting. Alternatives considered are direct reports of subscores, estimates of subscores based on total score, combined estimates based on subscores and…
Descriptors: Testing Programs, Regression (Statistics), Scores, Student Evaluation
Lee, Guemin; Lewis, Daniel M. – Educational and Psychological Measurement, 2008
The bookmark standard-setting procedure is an item response theory-based method that is widely implemented in state testing programs. This study estimates standard errors for cut scores resulting from bookmark standard settings under a generalizability theory model and investigates the effects of different universes of generalization and error…
Descriptors: Generalizability Theory, Testing Programs, Error of Measurement, Cutting Scores
Wu, Margaret – Educational Measurement: Issues and Practice, 2010
In large-scale assessments, such as state-wide testing programs, national sample-based assessments, and international comparative studies, there are many steps involved in the measurement and reporting of student achievement. There are always sources of inaccuracies in each of the steps. It is of interest to identify the source and magnitude of…
Descriptors: Testing Programs, Educational Assessment, Measures (Individuals), Program Effectiveness

Harris, Deborah J.; Kolen, Michael J. – Educational and Psychological Measurement, 1988
Three methods of estimating point-biserial correlation coefficient standard errors were compared: (1) assuming normality; (2) not assuming normality; and (3) bootstrapping. Although errors estimated assuming normality were biased, such estimates were less variable and easier to compute, suggesting that this might be the method of choice in some…
Descriptors: Error of Measurement, Estimation (Mathematics), Item Analysis, Statistical Analysis
Previous Page | Next Page ยป
Pages: 1 | 2