ERIC - Search Results

Publication Date

In 2026	0
Since 2025	0
Since 2022 (last 5 years)	0
Since 2017 (last 10 years)	0
Since 2007 (last 20 years)	5

Source

Applied Measurement in…	2
Applied Psychological…	2
Journal of Educational…	2
Educational Measurement:…	1
Evaluation and Program…	1
Journal of Educational and…	1
Journal of Experimental…	1
Studies in Educational…	1

Author

Brennan, Robert L.	1
Cai, Li	1
Gao, Xiaohong	1
Gohmann, Stephen F.	1
Huitzing, Hiddo A.	1
Marascuilo, Leonard A.	1
Petersen, Nancy S.	1
Phillips, Gary W.	1
Shepard, Lorrie	1
Shoemaker, David M.	1
Shoemaker, Judith Sauls	1
Veldkamp, Bernard P.	1
Verschoor, Angela J.	1
Wu, Margaret	1
Yang, Ji Seung	1
More ▼

Publication Type

Journal Articles	11
Reports - Research	5
Information Analyses	2
Reports - Evaluative	2
Guides - Non-Classroom	1
Opinion Papers	1
Reports - Descriptive	1

Education Level

Higher Education	3
Adult Education	1
Elementary Secondary Education	1

Audience

Location

Laws, Policies, & Programs

Assessments and Surveys

Program for International…	1
SAT (College Admission Test)	1

What Works Clearinghouse Rating

Showing all 11 results Save | Export

Estimation of Contextual Effects through Nonlinear Multilevel Latent Variable Modeling with a Metropolis-Hastings Robbins-Monro Algorithm

Peer reviewed

Direct link

Yang, Ji Seung; Cai, Li – Journal of Educational and Behavioral Statistics, 2014

The main purpose of this study is to improve estimation efficiency in obtaining maximum marginal likelihood estimates of contextual effects in the framework of nonlinear multilevel latent variable model by adopting the Metropolis-Hastings Robbins-Monro algorithm (MH-RM). Results indicate that the MH-RM algorithm can produce estimates and standard…

Descriptors: Computation, Hierarchical Linear Modeling, Mathematics, Context Effect

Impact of Design Effects in Large-Scale District and State Assessments

Peer reviewed

Direct link

Phillips, Gary W. – Applied Measurement in Education, 2015

This article proposes that sampling design effects have potentially huge unrecognized impacts on the results reported by large-scale district and state assessments in the United States. When design effects are unrecognized and unaccounted for they lead to underestimating the sampling error in item and test statistics. Underestimating the sampling…

Descriptors: State Programs, Sampling, Research Design, Error of Measurement

Measurement, Sampling, and Equating Errors in Large-Scale Assessments

Peer reviewed

Direct link

Wu, Margaret – Educational Measurement: Issues and Practice, 2010

In large-scale assessments, such as state-wide testing programs, national sample-based assessments, and international comparative studies, there are many steps involved in the measurement and reporting of student achievement. There are always sources of inaccuracies in each of the steps. It is of interest to identify the source and magnitude of…

Descriptors: Testing Programs, Educational Assessment, Measures (Individuals), Program Effectiveness

A Discussion of Population Invariance

Peer reviewed

Direct link

Brennan, Robert L. – Applied Psychological Measurement, 2008

The discussion here covers five articles that are linked in the sense that they all treat population invariance. This discussion of population invariance is a somewhat broader treatment of the subject than simply a discussion of these five articles. In particular, occasional reference is made to publications other than those in this issue. The…

Descriptors: Advanced Placement, Law Schools, Science Achievement, Achievement Tests

Infeasibility in Automated Test Assembly Models: A Comparison Study of Different Methods

Peer reviewed

Direct link

Huitzing, Hiddo A.; Veldkamp, Bernard P.; Verschoor, Angela J. – Journal of Educational Measurement, 2005

Several techniques exist to automatically put together a test meeting a number of specifications. In an item bank, the items are stored with their characteristics. A test is constructed by selecting a set of items that fulfills the specifications set by the test assembler. Test assembly problems are often formulated in terms of a model consisting…

Descriptors: Testing Programs, Programming, Mathematics, Item Sampling

A Discussion of Population Invariance of Equating

Peer reviewed

Direct link

Petersen, Nancy S. – Applied Psychological Measurement, 2008

This article discusses the five studies included in this issue. Each article addressed the same topic, population invariance of equating. They all used data from major standardized testing programs, and they all used essentially the same statistics to evaluate their results, namely, the root mean square difference and root expected mean square…

Descriptors: Testing Programs, Standardized Tests, Equated Scores, Evaluation Methods

Applicability of Multiple Matrix Sampling to Estimating Effectiveness of Educational Programs.

Peer reviewed

Shoemaker, David M.; Shoemaker, Judith Sauls – Evaluation and Program Planning: An International Journal, 1981

When evaluating the effectiveness of an educational program, multiple matrix sampling is particularly effective and efficient when the goal of the evaluation is estimating group (as opposed to individual) performance. The technique is described in some detail, with its advantages and disadvantages, and examples of its application are given.…

Descriptors: Educational Assessment, Evaluation Methods, Item Sampling, Program Effectiveness

Comparing State SAT Scores: Problems, Biases, and Corrections.

Peer reviewed

Gohmann, Stephen F. – Journal of Educational Measurement, 1988

One method to correct for selection bias in comparing Scholastic Aptitude Test (SAT) scores among states is presented, which is a modification of J. J. Heckman's Selection Bias Correction (1976, 1979). Empirical results suggest that sample selection bias is present in SAT score regressions. (SLD)

Descriptors: Regression (Statistics), Sampling, Scoring, Selection

Generalizability of Large-Scale Performance Assessments in Science: Promises and Problems.

Peer reviewed

Gao, Xiaohong; And Others – Applied Measurement in Education, 1994

This study provides empirical evidence about the sampling variability and generalizability (reliability) of a statewide performance assessment for grade six. Results for 600 students at individual and school levels indicate that task-sampling variability was the major source of measurement error. Rater-sampling variability was negligible. (SLD)

Descriptors: Achievement Tests, Educational Assessment, Elementary School Students, Error of Measurement

Purposes of Assessment.

Peer reviewed

Shepard, Lorrie – Studies in Educational Evaluation, 1979

Assessment generally refers to large-scale, system-wide measurement programs for pupil diagnosis; pupil certification; program evaluation; research; accountability; resource allocations; or teacher evaluation. The purpose of assessment should determine the test content, construction, administration, and examinees sampled. Assessment methods for…

Descriptors: Accountability, Diagnostic Tests, Educational Assessment, Educational Research

Measuring Differences among Non-Randomized Groups: an Epidemiological Model for Identifying Successful School Programs.

Peer reviewed

Marascuilo, Leonard A. – Journal of Experimental Education, 1979

The utility of the biomedical model of adjusted statistics is demonstrated. The model is recommended for use by educational researchers to randomize subjects for a more accurate estimate of school programs' success or failure when compared across classrooms or other units. (Author/MH)

Descriptors: Academic Achievement, Analysis of Variance, Comparative Analysis, Criterion Referenced Tests

Testing Programs	11
Sampling	9
Evaluation Methods	6
Equated Scores	5
Educational Assessment	4
Error of Measurement	4
Achievement Tests	3
Item Response Theory	3
Program Evaluation	3
State Programs	3
Testing Problems	3
Academic Achievement	2
College Entrance Examinations	2
Comparative Analysis	2
Educational Research	2
Evaluation Problems	2
Group Testing	2
Intermediate Grades	2
Item Sampling	2
Law Schools	2
Mathematics	2
Program Effectiveness	2
Racial Differences	2
Research Design	2
Science Achievement	2
More ▼