NotesFAQContact Us
Collection
Advanced
Search Tips
Laws, Policies, & Programs
Elementary and Secondary…1
What Works Clearinghouse Rating
Showing 361 to 375 of 416 results Save | Export
PDF pending restoration PDF pending restoration
Kane, Michael T.; Moloney, James M. – 1976
The Answer-Until-Correct (AUC) procedure has been proposed in order to increase the reliability of multiple-choice items. A model for examinees' behavior when they must respond to each item until they answer it correctly is presented. An expression for the reliability of AUC items, as a function of the characteristics of the item and the scoring…
Descriptors: Guessing (Tests), Item Analysis, Mathematical Models, Multiple Choice Tests
Peer reviewed Peer reviewed
Lord, Frederic M. – Applied Psychological Measurement, 1977
Under given conditions, conventional testing and computer-generated repeatable testing (CGRT) are equally effective for estimating examinee ability; CGRT is more effective for estimating the mean ability level of a group and less effective for estimating ability differences among individuals. These conclusion are drawn from domain-referenced test…
Descriptors: Career Development, Computer Assisted Testing, Difficulty Level, Group Norms
Peer reviewed Peer reviewed
Huck, Schuyler W.; And Others – Educational and Psychological Measurement, 1981
Believing that examinee-by-item interaction should be conceptualized as true score variability rather than as a result of errors of measurement, Lu proposed a modification of Hoyt's analysis of variance reliability procedure. Via a computer simulation study, it is shown that Lu's approach does not separate interaction from error. (Author/RL)
Descriptors: Analysis of Variance, Comparative Analysis, Computer Programs, Difficulty Level
Peer reviewed Peer reviewed
Harvill, Leo M. – Educational Measurement: Issues and Practice, 1991
This paper discusses standard error of measurement (SEM), the amount of variation or spread in the measurement errors for a test, and gives information needed to interpret test scores using SEMs. SEMs at various score levels should be used in calculating score bands rather than a single SEM value. (SLD)
Descriptors: Definitions, Equations (Mathematics), Error of Measurement, Estimation (Mathematics)
Peer reviewed Peer reviewed
Schiel, Jeffrey L.; Shaw, Dale G. – Applied Measurement in Education, 1992
Changes in information retention resulting from changes in reliability and number of intervals in scale construction were studied to provide quantitative information to help in decisions about choosing intervals. Information retention reached a maximum when the number of intervals was about 8 or more and reliability was near 1.0. (SLD)
Descriptors: Decision Making, Knowledge Level, Mathematical Models, Monte Carlo Methods
Peer reviewed Peer reviewed
PDF on ERIC Download full text
Attali, Yigal – ETS Research Report Series, 2007
This study examined the construct validity of the "e-rater"® automated essay scoring engine as an alternative to human scoring in the context of TOEFL® essay writing. Analyses were based on a sample of students who repeated the TOEFL within a short time period. Two "e-rater" scores were investigated in this study, the first…
Descriptors: Construct Validity, Computer Assisted Testing, Scoring, English (Second Language)
Peer reviewed Peer reviewed
Lord, Frederic M. – Psychometrika, 1974
Omitted items cannot properly be treated as wrong when estimating ability and item parameters. A convenient method for utilizing the information provided by omissions is presented. Theoretical and empirical justifications are presented for the estimates obtained by the new method. (Author)
Descriptors: Academic Ability, Guessing (Tests), Item Analysis, Latent Trait Theory
Peer reviewed Peer reviewed
PDF on ERIC Download full text
von Davier, Alina A.; Wilson, Christine – ETS Research Report Series, 2005
This paper discusses the assumptions required by the item response theory (IRT) true-score equating method (with Stocking & Lord, 1983; scaling approach), which is commonly used in the nonequivalent groups with an anchor data-collection design. More precisely, this paper investigates the assumptions made at each step by the IRT approach to…
Descriptors: Item Response Theory, True Scores, Equated Scores, Test Items
Yap, Kim Onn – 1978
A simulation study was designed to assess the severity of regression effects when a set of selection scores is also used as pretest scores as this pertains to RMC Model A of the Elementary and Secondary Education Act Title I evaluation and reporting system. Data sets were created with various characteristics (varying data reliability and…
Descriptors: Achievement Gains, Analysis of Variance, Elementary Secondary Education, Low Achievement
Brennan, Robert L. – 1977
Rules, procedures, and algorithms intended to aid researchers and practitioners in the application of generalizability theory to a broad range of measurement problems are presented. Two examples of measurement research are G studies, which examine the dependability of some general measurement procedure; and D studies, which provide the data for…
Descriptors: Analysis of Variance, Error of Measurement, Mathematical Models, Measurement
Stroud, T. W. F. – 1973
In a multiple (or multivariate) regression model where the predictors are subject to errors of measurement with a known variance-covariance structure, two-sample hypotheses are formulated for (i) equality of regressions on true scores and (ii) equality of residual variance (or covariance matrices) after regression on true scores. The hypotheses…
Descriptors: Achievement Tests, Comparative Analysis, Error of Measurement, Hypothesis Testing
Peer reviewed Peer reviewed
Hirsch, Thomas M. – Journal of Educational Measurement, 1989
Equatings were performed on both simulated and real data sets using common-examinee design and two abilities for each examinee. Results indicate that effective equating, as measured by comparability of true scores, is possible with the techniques used in this study. However, the stability of the ability estimates proved unsatisfactory. (TJH)
Descriptors: Academic Ability, College Students, Comparative Analysis, Computer Assisted Testing
Peer reviewed Peer reviewed
Kolen, Michael J.; And Others – Journal of Educational Measurement, 1992
A procedure is described for estimating the reliability and conditional standard errors of measurement of scale scores incorporating the discrete transformation of raw scores to scale scores. The method is illustrated using a strong true score model, and practical applications are described. (SLD)
Descriptors: College Entrance Examinations, Equations (Mathematics), Error of Measurement, Estimation (Mathematics)
Rachor, Robert E.; Cizek, Gregory J. – 1996
The gain, or difference, score is defined as the difference between the posttest score and the pretest score for an individual. Gain scores appear to be a natural measure of growth for education and the social sciences, but they contain two sources of measurement error, error in either the pretest or posttest scores, and cannot be considered…
Descriptors: Achievement Gains, Correlation, Educational Research, Elementary Secondary Education
Goldberg, Gail Lynn; Walker-Bartnick, Leslie – 1988
A scoring rubric transition study is described. It was designed to evaluate possible drift in scoring the Maryland Writing Test from year to year (when using a modified holistic scoring method), to evaluate strategies for revising swing rubrics from narrative and explanatory writing while maintaining original scoring standards, and to establish…
Descriptors: Educational Assessment, Elementary Secondary Education, Error of Measurement, Grading
Pages: 1  |  ...  |  18  |  19  |  20  |  21  |  22  |  23  |  24  |  25  |  26  |  27  |  28