ERIC - Search Results

Publication Date

In 2025	0
Since 2024	0
Since 2021 (last 5 years)	0
Since 2016 (last 10 years)	0
Since 2006 (last 20 years)	4

Descriptor

Comparative Analysis	14
Statistical Analysis	14
True Scores	14
Error of Measurement	7
Equated Scores	4
Measurement Techniques	4
Reliability	4
Test Items	4
Test Reliability	4
Correlation	3
Mathematical Models	3
Sampling	3
Test Theory	3
Achievement Tests	2
Analysis of Covariance	2
Computation	2
Probability	2
Raw Scores	2
Research Methodology	2
Response Style (Tests)	2
Statistical Bias	2
Test Interpretation	2
Test Results	2
Weighted Scores	2
Ability	1
More ▼

Source

ETS Research Report Series	3
Applied Psychological…	1
Journal of Educational and…	1

Publication Type

Reports - Research	10
Journal Articles	5
Reports - Evaluative	2
Information Analyses	1
Speeches/Meeting Papers	1

Education Level

Audience

Researchers	2
Administrators	1
Practitioners	1

Location

Laws, Policies, & Programs

Assessments and Surveys

What Works Clearinghouse Rating

Showing all 14 results Save | Export

Weighting Test Samples in IRT Linking and Equating: Toward an Improved Sampling Design for Complex Equating. Research Report. ETS RR-13-39

Peer reviewed
PDF on ERIC

Download full text

Qian, Jiahe; Jiang, Yanming; von Davier, Alina A. – ETS Research Report Series, 2013

Several factors could cause variability in item response theory (IRT) linking and equating procedures, such as the variability across examinee samples and/or test items, seasonality, regional differences, native language diversity, gender, and other demographic variables. Hence, the following question arises: Is it possible to select optimal…

Descriptors: Item Response Theory, Test Items, Sampling, True Scores

Confidence Intervals for True Scores Using the Skew-Normal Distribution

Peer reviewed

Direct link

Garcia-Perez, Miguel A. – Journal of Educational and Behavioral Statistics, 2010

A recent comparative analysis of alternative interval estimation approaches and procedures has shown that confidence intervals (CIs) for true raw scores determined with the Score method--which uses the normal approximation to the binomial distribution--have actual coverage probabilities that are closest to their nominal level. It has also recently…

Descriptors: Computation, Statistical Analysis, True Scores, Raw Scores

An Equipercentile Version of the Levine Linear Observed-Score Equating Function Using the Methods of Kernel Equating. Research Report. ETS RR-07-14

Peer reviewed
PDF on ERIC

Download full text

von Davier, Alina A.; Fournier-Zajac, Stephanie; Holland, Paul W. – ETS Research Report Series, 2007

In the nonequivalent groups with anchor test (NEAT) design, there are several ways to use the information provided by the anchor in the equating process. One of the NEAT-design equating methods is the linear observed-score Levine method (Kolen & Brennan, 2004). It is based on a classical test theory model of the true scores on the test forms…

Descriptors: Equated Scores, Statistical Analysis, Test Items, Test Theory

Reliability and the Nonequivalent Groups with Anchor Test Design. Research Report. ETS RR-07-16

Peer reviewed
PDF on ERIC

Download full text

Moses, Tim; Kim, Sooyeon – ETS Research Report Series, 2007

This study evaluated the impact of unequal reliability on test equating methods in the nonequivalent groups with anchor test (NEAT) design. Classical true score-based models were compared in terms of their assumptions about how reliability impacts test scores. These models were related to treatment of population ability differences by different…

Descriptors: Reliability, Equated Scores, Test Items, Statistical Analysis

Comparison of IRT Observed-Score and True-Score 'Equatings.'

Download full text

Lord, Frederic M.; Wingersky, Marilyn S. – 1983

Two methods of 'equating' tests using item response theory (IRT) are compared, one using true scores, the other using the estimated distribution of observed scores. On the data studied, they yield almost indistinguishable results. This is a reassuring result for users of IRT equating methods. (Author)

Descriptors: Comparative Analysis, Equated Scores, Estimation (Mathematics), Latent Trait Theory

An Alternative Interpretation of Three Stability Models. Measurement and Methodology, Work Unit 2: Technical Adequacy of Tests.

Wilcox, Rand R. – 1978

Two fundamental problems in mental test theory are to estimate true score and to estimate the amount of error when testing an examinee. In this report, three probability models which characterize a single test item in terms of a population of examinees are described. How these models may be modified to characterize a single examinee in terms of an…

Descriptors: Achievement Tests, Comparative Analysis, Error of Measurement, Mathematical Models

The Effect of Sequential Dependence on the Sampling Distributions of KR-20, KR-21, and Split-Halves Reliabilities.

Download full text

Sullins, Walter L. – 1971

Five-hundred dichotomously scored response patterns were generated with sequentially independent (SI) items and 500 with dependent (SD) items for each of thirty-six combinations of sampling parameters (i.e., three test lengths, three sample sizes, and four item difficulty distributions). KR-20, KR-21, and Split-Half (S-H) reliabilities were…

Descriptors: Comparative Analysis, Correlation, Error of Measurement, Item Analysis

Extending Classical Test Theory to the Measurement of Change.

Download full text

O'Connor, Edward F., Jr. – 1970

The problem of the comparability of change scores is investigated. Change quotients and residual change scores are evaluated as alternative approaches and methods for estimating the true change and true score residual, the reliability of change scores and residuals, and procedures for constructing confidence intervals for residuals are explored.…

Descriptors: Comparative Analysis, Correlation, Equated Scores, Evaluation Methods

Testing a Linear Relation Between True Scores of Two Measures.

Download full text

Kristof, Walter – 1971

We concern ourselves with the hypothesis that two variables have a perfect disattenuated correlation, hence measure the same trait except for errors of measurement. This hypothesis is equivalent to saying, within the adopted model, that true scores of two psychological tests satisfy a linear relation. Statistical tests of this hypothesis are…

Descriptors: Analysis of Covariance, Comparative Analysis, Correlation, Error of Measurement

The Internal and External Optimality of Decisions Based on Tests.

Peer reviewed

Mellenbergh, Gideon J.; van der Linden, Wim J. – Applied Psychological Measurement, 1979

For six tests, coefficient delta as an index for internal optimality is computed. Internal optimality is defined as the magnitude of risk of the decision procedure with respect to the true score. Results are compared with an alternative index (coefficient kappa) for assessing the consistency of decisions. (Author/JKS)

Descriptors: Classification, Comparative Analysis, Decision Making, Error of Measurement

An Optimizing Weight For Wrong Scores.

Download full text

Donlon, Thomas F. – 1975

This study empirically determined the optimizing weight to be applied to the Wrongs Total Score in scoring rubrics of the general form = R - kW, where S is the Score, R the Rights Total, k the weight and W the Wrongs Total, if reliability is to be maximized. As is well known, the traditional formula score rests on a theoretical framework which is…

Descriptors: Achievement Tests, Comparative Analysis, Guessing (Tests), Multiple Choice Tests

The KR-20 Reliability Coefficient as a Special Case of a More General Formula.

Download full text

Smith, Donald M. – 1976

The Kuder Richardson-20 Formula is shown to be a special case, where each examinee is given sufficient time to answer each item, of a more general formula where each examinee may not be allowed the necessary time. The formula is extended to allow two scores, knowledge and speed, to be extracted from each examinees test score. Using a sample of 82…

Descriptors: Career Development, Comparative Analysis, Grade Point Average, Predictive Measurement

Analysis of Covariance: Is It the Appropriate Model to Study Change?

Download full text

Marston, Paul T., Borich, Gary D. – 1977

The four main approaches to measuring treatment effects in schools; raw gain, residual gain, covariance, and true scores; were compared. A simulation study showed true score analysis produced a large number of Type-I errors. When corrected for this error, this method showed the least power of the four. This outcome was clearly the result of the…

Descriptors: Achievement Gains, Analysis of Covariance, Comparative Analysis, Error of Measurement

A Search for TRUTH in Student Responses to Selected Survey Items. AIR 1993 Annual Forum Paper.

Download full text

Takalkar, Pradnya; And Others – 1993

This study compared 4,594 student responses from three different surveys of incoming students at the University of South Florida (USF) with data from Florida's State University System (SUS) admissions files to determine what proportion of error occurs in the survey responses. Specifically, the study investigated the amount of measurement error in…

Descriptors: College Admission, College Applicants, College Bound Students, Comparative Analysis

von Davier, Alina A.	2
Donlon, Thomas F.	1
Fournier-Zajac, Stephanie	1
Garcia-Perez, Miguel A.	1
Holland, Paul W.	1
Jiang, Yanming	1
Kim, Sooyeon	1
Kristof, Walter	1
Lord, Frederic M.	1
Marston, Paul T., Borich,…	1
Mellenbergh, Gideon J.	1
Moses, Tim	1
O'Connor, Edward F., Jr.	1
Qian, Jiahe	1
Smith, Donald M.	1
Sullins, Walter L.	1
Takalkar, Pradnya	1
Wilcox, Rand R.	1
Wingersky, Marilyn S.	1
van der Linden, Wim J.	1
More ▼