ERIC - Search Results

Publication Date

In 2025	0
Since 2024	0
Since 2021 (last 5 years)	0
Since 2016 (last 10 years)	1
Since 2006 (last 20 years)	4

Descriptor

Error of Measurement	12
Sample Size	12
Test Construction	12
Test Items	5
Equated Scores	4
Item Response Theory	4
Sampling	4
Statistical Analysis	4
Comparative Analysis	3
Data Collection	3
Estimation (Mathematics)	3
Item Banks	3
Reliability	3
Test Format	3
Computer Simulation	2
Criterion Referenced Tests	2
Cutting Scores	2
Measurement Techniques	2
Statistical Bias	2
Test Length	2
Test Reliability	2
Ability	1
Accuracy	1
Analysis of Variance	1
Classroom Research	1
More ▼

Source

ETS Research Report Series	4
Journal of Educational…	1

Publication Type

Reports - Research	6
Speeches/Meeting Papers	6
Journal Articles	5
Reports - Evaluative	3
Books	1
Guides - Non-Classroom	1

Education Level

Audience

Researchers	2
Students	1

Location

Laws, Policies, & Programs

Assessments and Surveys

Graduate Management Admission…

What Works Clearinghouse Rating

Showing all 12 results Save | Export

Different Methods of Adjusting for Form Difficulty under the Rasch Model: Impact on Consistency of Assessment Results. Research Report. ETS RR-19-08

Peer reviewed
PDF on ERIC

Download full text

Manna, Venessa F.; Gu, Lixiong – ETS Research Report Series, 2019

When using the Rasch model, equating with a nonequivalent groups anchor test design is commonly achieved by adjustment of new form item difficulty using an additive equating constant. Using simulated 5-year data, this report compares 4 approaches to calculating the equating constants and the subsequent impact on equating results. The 4 approaches…

Descriptors: Item Response Theory, Test Items, Test Construction, Sample Size

Exploring Alternative Test Form Linking Designs with Modified Equating Sample Size and Anchor Test Length. Research Report. ETS RR-13-02

Peer reviewed
PDF on ERIC

Download full text

Wang, Lin; Qian, Jiahe; Lee, Yi-Hsuan – ETS Research Report Series, 2013

The purpose of this study was to evaluate the combined effects of reduced equating sample size and shortened anchor test length on item response theory (IRT)-based linking and equating results. Data from two independent operational forms of a large-scale testing program were used to establish the baseline results for evaluating the results from…

Descriptors: Test Construction, Item Response Theory, Testing Programs, Simulation

Methods of Linking with Small Samples in a Common-Item Design: An Empirical Comparison. Research Report. ETS RR-09-38

Peer reviewed
PDF on ERIC

Download full text

Kim, Sooyeon; Livingston, Samuel A. – ETS Research Report Series, 2009

A series of resampling studies was conducted to compare the accuracy of equating in a common item design using four different methods: chained equipercentile equating of smoothed distributions, chained linear equating, chained mean equating, and the circle-arc method. Four operational test forms, each containing more than 100 items, were used for…

Descriptors: Sampling, Sample Size, Accuracy, Test Items

An Alternative Data Collection Design for Equating with Very Small Samples. Research Report. ETS RR-08-11

Peer reviewed
PDF on ERIC

Download full text

Puhan, Gautam; Moses, Tim; Grant, Mary; McHale, Fred – ETS Research Report Series, 2008

A single group (SG) equating design with nearly equivalent test forms (SiGNET) design was developed by Grant (2006) to equate small volume tests. The basis of this design is that examinees take two largely overlapping test forms within a single administration. The scored items for the operational form are divided into mini-tests called testlets.…

Descriptors: Data Collection, Equated Scores, Item Sampling, Sample Size

Norming and Norm-Referenced Test Scores.

Download full text

Rodriguez, Maximo – 1997

Norm-referenced tests yield information regarding a student's performance in comparison to a norm or average of performance by similar students. Norms are statistics that describe the test performance of a well-defined population. The process of constructing norms, called norming, is explored briefly in this paper. Some of the most widely reported…

Descriptors: Data Collection, Error of Measurement, Identification, Norm Referenced Tests

EFFECT OF ERROR OF MEASUREMENT ON THE POWER OF STATISTICAL TESTS. FINAL REPORT.

Download full text

CLEARY, T.A.; LINN, ROBERT L. – 1967

THE PURPOSE OF THIS RESEARCH WAS TO STUDY THE EFFECT OF ERROR OF MEASUREMENT UPON THE POWER OF STATISTICAL TESTS. ATTENTION WAS FOCUSED ON THE F-TEST OF THE SINGLE FACTOR ANALYSIS OF VARIANCE. FORMULAS WERE DERIVED TO SHOW THE RELATIONSHIP BETWEEN THE NONCENTRALITY PARAMETERS FOR ANALYSES USING TRUE SCORES AND THOSE USING OBSERVED SCORES. THE…

Descriptors: Analysis of Variance, Error of Measurement, Measurement Techniques, Psychological Testing

Influence of Item Parameter Estimation Errors in Test Development.

Peer reviewed

Hambleton, Ronald K.; And Others – Journal of Educational Measurement, 1993

Item parameter estimation errors in test development are highlighted. The problem is illustrated with several simulated data sets, and a conservative solution is offered for addressing the problem in item response theory test development practice. Steps that reduce the problem of capitalizing on chance in item selections are suggested. (SLD)

Descriptors: Computer Simulation, Error of Measurement, Estimation (Mathematics), Item Banks

Item Parameter Estimation Errors and Their Influence on Test Information Functions.

Download full text

Hambleton, Ronald K.; Jones, Russell W. – 1993

Errors in item parameter estimates have a negative impact on the accuracy of item and test information functions. The estimation errors may be random, but because items with higher levels of discriminating power are more likely to be selected for a test, and these items are most apt to contain positive errors, the result is that item information…

Descriptors: Computer Simulation, Error of Measurement, Estimation (Mathematics), Item Banks

The Effect of Sequential Dependence on the Sampling Distributions of KR-20, KR-21, and Split-Halves Reliabilities.

Download full text

Sullins, Walter L. – 1971

Five-hundred dichotomously scored response patterns were generated with sequentially independent (SI) items and 500 with dependent (SD) items for each of thirty-six combinations of sampling parameters (i.e., three test lengths, three sample sizes, and four item difficulty distributions). KR-20, KR-21, and Split-Half (S-H) reliabilities were…

Descriptors: Comparative Analysis, Correlation, Error of Measurement, Item Analysis

A Method for Determining the Length of Criterion-Referenced Tests Using Reliability and Validity Indices.

Download full text

Mills, Craig N.; Simon, Robert – 1981

When criterion-referenced tests are used to assign examinees to states reflecting their performance level on a test, the better known methods for determining test length, which consider relationships among domain scores and errors of measurement, have their limitations. The purpose of this paper is to present a computer system named TESTLEN, which…

Descriptors: Computer Assisted Testing, Criterion Referenced Tests, Cutting Scores, Error of Measurement

How To Sample in Surveys. The Survey Kit, Volume 6.

Fink, Arlene – 1995

The nine-volume Survey Kit is designed to help readers prepare and conduct surveys and become better users of survey results. All the books in the series contain instructional objectives, exercises and answers, examples of surveys in use, illustrations of survey questions, guidelines for action, checklists of "dos and don'ts," and…

Descriptors: Costs, Data Collection, Educational Research, Error of Measurement

An Empirical Study of the Properties of Two Estimates of Decision-Consistency Used with Two Types of Teacher-Constructed Classroom Tests.

Macpherson, Colin R.; Rowley, Glenn L. – 1986

Teacher-made mastery tests were administered in a classroom-sized sample to study their decision consistency. Decision-consistency of criterion-referenced tests is usually defined in terms of the proportion of examinees who are classified in the same way after two test administrations. Single-administration estimates of decision consistency were…

Descriptors: Classroom Research, Comparative Testing, Criterion Referenced Tests, Cutting Scores

Hambleton, Ronald K.	2
CLEARY, T.A.	1
Fink, Arlene	1
Grant, Mary	1
Gu, Lixiong	1
Jones, Russell W.	1
Kim, Sooyeon	1
LINN, ROBERT L.	1
Lee, Yi-Hsuan	1
Livingston, Samuel A.	1
Macpherson, Colin R.	1
Manna, Venessa F.	1
McHale, Fred	1
Mills, Craig N.	1
Moses, Tim	1
Puhan, Gautam	1
Qian, Jiahe	1
Rodriguez, Maximo	1
Rowley, Glenn L.	1
Simon, Robert	1
Sullins, Walter L.	1
Wang, Lin	1
More ▼