Publication Date
| In 2026 | 0 |
| Since 2025 | 2 |
| Since 2022 (last 5 years) | 12 |
| Since 2017 (last 10 years) | 26 |
| Since 2007 (last 20 years) | 90 |
Descriptor
| True Scores | 416 |
| Error of Measurement | 121 |
| Test Reliability | 110 |
| Statistical Analysis | 107 |
| Mathematical Models | 97 |
| Item Response Theory | 87 |
| Correlation | 76 |
| Equated Scores | 76 |
| Reliability | 64 |
| Test Theory | 52 |
| Test Items | 51 |
| More ▼ | |
Source
Author
Publication Type
Education Level
Audience
| Researchers | 12 |
| Practitioners | 2 |
| Administrators | 1 |
| Teachers | 1 |
Location
| Australia | 1 |
| Canada | 1 |
| China | 1 |
| Colorado | 1 |
| Illinois | 1 |
| Israel | 1 |
| New York | 1 |
| Oregon | 1 |
| Taiwan | 1 |
| Texas | 1 |
| United Kingdom (England) | 1 |
| More ▼ | |
Laws, Policies, & Programs
| Elementary and Secondary… | 1 |
Assessments and Surveys
What Works Clearinghouse Rating
Peer reviewedCook, William L.; Goldstein, Michael J. – Child Development, 1993
Tested the assumption that familial self-reports are biased by social desirability and other factors, through the use of a latent variables modeling approach that evaluated rater reliability and bias in mother, father, and child ratings of parent-child negativity. Results based on 78 families demonstrated that family member ratings contained a…
Descriptors: Children, Family Relationship, Interrater Reliability, Parent Child Relationship
Peer reviewedWoodruff, David – Journal of Educational Measurement, 1991
Improvements are made on previous estimates for the conditional standard error of measurement in prediction, the conditional standard error of estimation (CSEE), and the conditional standard error of prediction (CSEP). Better estimates of how test length affects CSEE and CSEP are derived. (SLD)
Descriptors: Equations (Mathematics), Error of Measurement, Estimation (Mathematics), Mathematical Models
Peer reviewedMillsap, Roger E.; Everson, Howard – Multivariate Behavioral Research, 1991
Use of confirmatory factor analysis (CFA) with nonzero latent means in testing six different measurement models from classical test theory is discussed. Implications of the six models for observed mean and covariance structures are described, and three examples of the use of CFA in testing the models are presented. (SLD)
Descriptors: Comparative Analysis, Equations (Mathematics), Goodness of Fit, Mathematical Models
Peer reviewedCohen, Allan S.; And Others – Applied Psychological Measurement, 1993
Three measures of differential item functioning for the dichotomous response model are extended to include Samejima's graded response model. Two are based on area differences between item true score functions, and one is a chi-square statistic for comparing differences in item parameters. (SLD)
Descriptors: Chi Square, Comparative Analysis, Identification, Item Bias
Peer reviewedHoijtink, Herbert; Boomsma, Anne – Psychometrika, 1996
The quality of approximations to first- and second-order moments based on latent ability estimates is discussed. The ability estimates are based on the Rasch or the two-parameter logistic model, and true score theory is used to account for the fact that the basic quantities are estimates. (SLD)
Descriptors: Ability, Bayesian Statistics, Estimation (Mathematics), Item Response Theory
Penfield, Randall D.; Giacobbi, Peter R., Jr – Measurement in Physical Education and Exercise Science, 2004
Item content-relevance is an important consideration for researchers when developing scales used to measure psychological constructs. Aiken (1980) proposed a statistic, "V," that can be used to summarize item content-relevance ratings obtained from a panel of expert judges. This article proposes the application of the Score confidence interval to…
Descriptors: Intervals, True Scores, Content Validity, Sport Psychology
Tong, Ye; Kolen, Michael – Applied Psychological Measurement, 2005
The performance of three equating methods--the presmoothed equipercentile method, the item response theory (IRT) true score method, and the IRT observed score method--were examined based on three equating criteria: the same distributions property, the first-order equity property, and the second-order equity property. The magnitude of the…
Descriptors: True Scores, Criteria, Raw Scores, Item Response Theory
Yang, Wen-Ling; Houang, Richard T. – 1996
The influence of anchor length on the accuracy of test equating was studied using Tucker's linear method and two Item-Response-Theory (IRT) based methods, focusing on whether equating accuracy improved with more anchor items, whether the anchor effect depended on the equating method used, and the adequacy of the inclusion of the guessing parameter…
Descriptors: Equated Scores, Estimation (Mathematics), Guessing (Tests), Item Response Theory
Hsu, Yaowen; Ackerman, Terry A. – 1994
This paper summarizes an investigation of the format used for equating the 1993 Illinois Goal Assessment Program (IGAP) sixth grade reading test. In 1992, each student took only one test, either a narrative test or an expository test. In 1993, there was only one test, which included both formats. Several possible approaches for linking the 1993…
Descriptors: Context Effect, Elementary School Students, Equated Scores, Grade 6
Bekhuis, Tanja C. H. M. – 1988
An Educational Testing Service (ETS) procedure was evaluated, which is based on item response theory and estimates true scores on tests not taken. The reading, vocabulary, and mathematics tests of high school seniors from the National Longitudinal Study (NLS) of 1972 and the High School and Beyond (HSB) seniors of 1980 and 1982 were found to share…
Descriptors: Achievement Tests, Computer Simulation, Estimation (Mathematics), Latent Trait Theory
Livingston, Samuel A. – 1978
The traditional reliability coefficient and standard error of measurement are not adequate measures of reliability for tests used to make pass/fail decisions. Answering the important reliability questions requires estimation of the joint distribution of true and observed scores. Lord's "Method 20" estimates this distribution without the…
Descriptors: Cutting Scores, Decision Making, Efficiency, Error of Measurement
Cheshier, Stephen R. – Engineering Education, 1975
Describes a simplified method for converting raw scores to standard scores and transforming them to "T-scores" for easy comparison of performance. Obtaining letter grades from T-scores is discussed. A reading list is included. (GH)
Descriptors: Achievement Rating, Error of Measurement, Evaluation Methods, Grades (Scholastic)
Bliss, Leonard B. – 1981
The aim of this study was to show that the superiority of corrected-for-guessing scores over number right scores as true score estimates depends on the ability of examinees to recognize situations where they can eliminate one or more alternatives as incorrect and to omit items where they would only be guessing randomly. Previous investigations…
Descriptors: Algorithms, Guessing (Tests), Intermediate Grades, Multiple Choice Tests
Divgi, D. R. – 1978
One aim of criterion-referenced testing is to classify an examinee without reference to a norm group; therefore, any statements about the dependability of such classification ought to be group-independent also. A population-independent index is proposed in terms of the probability of incorrect classification near the cutoff true score. The…
Descriptors: Criterion Referenced Tests, Cutting Scores, Difficulty Level, Error of Measurement
Brennan, Robert L. – 1974
An attempt is made to explore the use of subjective probabilities in the analysis of item data, especially criterion-referenced item data. Two assumptions are implicit: (1) one wants to obtain a maximum amount of information with respect to an item using a minimum number of subjects; and (2) once the item is validated, it may well be administered…
Descriptors: Confidence Testing, Criterion Referenced Tests, Guessing (Tests), Item Analysis

Direct link
