Publication Date
| In 2026 | 0 |
| Since 2025 | 2 |
| Since 2022 (last 5 years) | 12 |
| Since 2017 (last 10 years) | 26 |
| Since 2007 (last 20 years) | 90 |
Descriptor
| True Scores | 416 |
| Error of Measurement | 121 |
| Test Reliability | 110 |
| Statistical Analysis | 107 |
| Mathematical Models | 97 |
| Item Response Theory | 87 |
| Correlation | 76 |
| Equated Scores | 76 |
| Reliability | 64 |
| Test Theory | 52 |
| Test Items | 51 |
| More ▼ | |
Source
Author
Publication Type
Education Level
Audience
| Researchers | 12 |
| Practitioners | 2 |
| Administrators | 1 |
| Teachers | 1 |
Location
| Australia | 1 |
| Canada | 1 |
| China | 1 |
| Colorado | 1 |
| Illinois | 1 |
| Israel | 1 |
| New York | 1 |
| Oregon | 1 |
| Taiwan | 1 |
| Texas | 1 |
| United Kingdom (England) | 1 |
| More ▼ | |
Laws, Policies, & Programs
| Elementary and Secondary… | 1 |
Assessments and Surveys
What Works Clearinghouse Rating
Rim, Eui-Do; Bresler, Samuel – 1974
Livingston's reliability coefficients and Harris' indices of efficiency were computed along with the classical internal consistency coefficients, KR-20's (Kuder-Richardson internal consistency coefficient), for 678 criterion-referenced tests in the A through E levels of an individualized mathematics program. The coefficients were carefully studied…
Descriptors: Academic Achievement, Correlation, Criterion Referenced Tests, Elementary School Mathematics
Stanley, Julian C.; Livingston, Samuel A. – 1971
Besides the ubiquitous Pearson product-moment r, there are a number of other measures of relationship that are attenuated by errors of measurement and for which the relationship between true measures can be estimated. Among these are the correlation ratio (eta squared), Kelley's unbiased correlation ratio (epsilon squared), Hays' omega squared,…
Descriptors: Analysis of Variance, Cluster Grouping, Correlation, Data Analysis
Epstein, Kenneth I. – 1975
Since the primary purpose of classical testing is to rank order examinees consistently, the absolute value of the true score has been relatively unimportant. However, the major purpose of criterion referenced testing is to estimate the true capabilities of examinees to perform specific tasks. Hence, the problems of true score determination assume…
Descriptors: Bayesian Statistics, Criterion Referenced Tests, Mathematical Models, Military Personnel
Camilli, Gregory; Wang, Ming-mei; Fesq, Jaqueline – 1992
The Law School Admission Test (LSAT) was examined to see if the items on a form could be divided into different subgroups in which items looked statistically similar within the subgroups but statistically different between subgroups. Of such subgrouping can be detected, it is likely that the subgroups of items measure different abilities, and the…
Descriptors: Admission (School), College Entrance Examinations, Factor Analysis, Item Response Theory
Peer reviewedHorn, John L. – Educational and Psychological Measurement, 1971
Descriptors: Analysis of Variance, Error of Measurement, Hypothesis Testing, Mathematical Models
Peer reviewedJoreskog, K. G. – Psychometrika, 1971
Descriptors: Correlation, Factor Analysis, Goodness of Fit, Mathematical Models
Peer reviewedWerts, C. E.; And Others – Educational and Psychological Measurement, 1980
Test-retest correlations can lead to biased reliability estimates when there is instability of true scores and/or when measurement errors are correlated. Using three administrations of the Test of Standard Written English and essay ratings, an analysis is demonstrated which separates true score instability and correlated errors. (Author/BW)
Descriptors: College Freshmen, Error of Measurement, Essay Tests, Higher Education
Peer reviewedMacCann, Robert G. – Journal of Educational Statistics, 1990
For anchor test equating, 3 linear observed score methods are derived for populations differing in ability. Each version requires that the correlations of the tests with the selection variable be known. Five sets of assumptions are made for each model--yielding 15 methods--which are then related to existing methods. (SLD)
Descriptors: Ability, Ability Grouping, Equated Scores, Equations (Mathematics)
Wang, Jinhao; Brown, Michelle Stallone – Journal of Technology, Learning, and Assessment, 2007
The current research was conducted to investigate the validity of automated essay scoring (AES) by comparing group mean scores assigned by an AES tool, IntelliMetric [TM] and human raters. Data collection included administering the Texas version of the WriterPlacer "Plus" test and obtaining scores assigned by IntelliMetric [TM] and by…
Descriptors: Test Scoring Machines, Scoring, Comparative Testing, Intermode Differences
Livingston, Samuel A. – 1984
Much previously published material for estimating the reliability of classification has been based on the assumption that a test consists of a known number of equally weighted items. The test score is the number of those items answered correctly. These methods cannot be used with classifications based on weighted composite scores, especially if…
Descriptors: Equated Scores, Essay Tests, Estimation (Mathematics), Mathematical Models
Rudner, Lawrence M. – 1977
Because it is a true score model employing item parameters which are independent of the examined sample, item characteristic curve theory (ICC) offers several advantages over classical measurement theory. In this paper an approach to biased item identification using ICC theory is described and applied. The ICC theory approach is attractive in that…
Descriptors: Bias, Criteria, Culture Fair Tests, Item Analysis
Statistical Comparisons Among Hierarchies Based on Latent Structure Models. Research Monograph 77-1.
Macready, George B.; Dayton, C. Mitchell – 1977
A probabilistic hypothesis testing procedure to assess the fit of hypothesized hierarchical structures for test item data is discussed. Statistical procedures are presented which are useful for evaluating the fit of data of a certain class of probabilistic models. These models apply to sets of dichotomous (O,1) responses for which there are…
Descriptors: Error of Measurement, Goodness of Fit, Hypothesis Testing, Mathematical Models
Huynh, Huynh; Saunders, Joseph C., III – 1979
The Bayesian approach to setting passing scores, as proposed by Swaminathan, Hambleton, and Algina, is compared with the empirical Bayes approach to the same problem that is derived from Huynh's decision-theoretic framework. Comparisons are based on simulated data which follow an approximate beta-binomial distribution and on real test results from…
Descriptors: Bayesian Statistics, Cutting Scores, Grade 3, Mastery Tests
PDF pending restorationHarris, Chester W. – 1971
Livingston's work is a careful analysis of what occurs when one pools two populations with different means, but similar variances and reliability coefficients. However, his work fails to advance reliability theory for the special case of criterion-referenced testing. See ED 042 802 for Livingston's paper. (MS)
Descriptors: Analysis of Variance, Criterion Referenced Tests, Error of Measurement, Reliability
Werts, C. E.; And Others – 1972
Intraclass correlation reliability estimates are based on the assumption that the various measures are equivalent. Joreskog's (1970) general model for the analysis of covariance structures can be used to test the validity of this assumption. (For related document, see TM 002 301.) (Author)
Descriptors: Analysis of Covariance, Correlation, Hypothesis Testing, Mathematical Models

Direct link
