ERIC - Search Results

Publication Date

In 2025	0
Since 2024	0
Since 2021 (last 5 years)	0
Since 2016 (last 10 years)	0
Since 2006 (last 20 years)	1

Descriptor

Scoring	8
Test Reliability	8
True Scores	8
Testing	4
Criterion Referenced Tests	3
Error of Measurement	3
Measurement Techniques	3
Statistical Analysis	3
Test Validity	3
Comparative Analysis	2
Essay Tests	2
Factor Analysis	2
Item Analysis	2
Mathematical Models	2
Prediction	2
Test Construction	2
Test Interpretation	2
Test Theory	2
Weighted Scores	2
Bias	1
Career Development	1
Comparative Education	1
Computer Assisted Testing	1
Computer Programs	1
Confidence Testing	1
More ▼

Source

ETS Research Report Series	1
Journal of School Psychology	1
Research Quarterly for…	1

Author

Attali, Yigal	1
Brennan, Robert L.	1
Feldt, Leonard S.	1
Gleser, Leon Jay	1
Hanna, Gerald S.	1
Livingston, Samuel A.	1
Perry, Dallis	1
Smith, Donald M.	1
Spray, Judith A.	1

Publication Type

Reports - Research	5
Journal Articles	3
Information Analyses	1
Speeches/Meeting Papers	1

Education Level

Audience

Researchers

Location

Laws, Policies, & Programs

Assessments and Surveys

Test of English as a Foreign…

What Works Clearinghouse Rating

Showing all 8 results Save | Export

Estimating the Reliability of Classifications Based on Composite Scores.

Download full text

Livingston, Samuel A. – 1984

Much previously published material for estimating the reliability of classification has been based on the assumption that a test consists of a known number of equally weighted items. The test score is the number of those items answered correctly. These methods cannot be used with classifications based on weighted composite scores, especially if…

Descriptors: Equated Scores, Essay Tests, Estimation (Mathematics), Mathematical Models

Estimating Major Sources of Measurement Error in Individual Intelligence Scales: Taking Our Heads out of the Sand.

Peer reviewed

Hanna, Gerald S.; And Others – Journal of School Psychology, 1981

Discusses four ubiquitous major sources of measurement error for individual intelligence scales. Argues that where these sources cannot be directly investigated, they should be estimated rather than ignored. Estimated the typical magnitude of error arising from each of content sampling, time sampling, scoring, and administration. (Author)

Descriptors: Error of Measurement, Intelligence Tests, Measurement Techniques, Sampling

Construct Validity of "e-rater"® in Scoring TOEFL® Essays. Research Report. ETS RR-07-21

Peer reviewed
PDF on ERIC

Download full text

Attali, Yigal – ETS Research Report Series, 2007

This study examined the construct validity of the "e-rater"® automated essay scoring engine as an alternative to human scoring in the context of TOEFL® essay writing. Analyses were based on a sample of students who repeated the TOEFL within a short time period. Two "e-rater" scores were investigated in this study, the first…

Descriptors: Construct Validity, Computer Assisted Testing, Scoring, English (Second Language)

The Attenuation Paradox and Internal Consistency.

Download full text

Gleser, Leon Jay – 1971

An attempt is made to indicate why the concept of "true score" naturally leads to the belief that test validity must increase with an increase in test and/or average item reliability, and why this is correct for the classical single-factor model first introduced by Spearman. The statistical model used by Loevinger is introduced to…

Descriptors: Factor Analysis, Item Analysis, Mathematical Models, Measurement Techniques

A Theory-Based Comparison of the Reliabilities of Fixed-Length and Trials-to-Criterion Scoring of Physical Education Skills Tests.

Peer reviewed

Feldt, Leonard S.; Spray, Judith A. – Research Quarterly for Exercise and Sport, 1983

The reliabilities of two types of measurement plans were compared across six hypothetical distributions of true scores or abilities. The measurement plans were: (1) fixed-length, where the number of trials for all examinees is set in advance; and (2) trials-to-criterion, where examinees must keep trying until they complete a given number of trials…

Descriptors: Criterion Referenced Tests, Evaluation Methods, Higher Education, Measurement Techniques

The KR-20 Reliability Coefficient as a Special Case of a More General Formula.

Download full text

Smith, Donald M. – 1976

The Kuder Richardson-20 Formula is shown to be a special case, where each examinee is given sufficient time to answer each item, of a more general formula where each examinee may not be allowed the necessary time. The formula is extended to allow two scores, knowledge and speed, to be extracted from each examinees test score. Using a sample of 82…

Descriptors: Career Development, Comparative Analysis, Grade Point Average, Predictive Measurement

The Evaluation of Mastery Test Items. Final Report.

Download full text

Brennan, Robert L. – 1974

The first four chapters of this report primarily provide an extensive, critical review of the literature with regard to selected aspects of the criterion-referenced and mastery testing fields. Major topics treated include: (a) definitions, distinctions, and background, (b) the relevance of classical test theory, (c) validity and procedures for…

Descriptors: Computer Programs, Confidence Testing, Criterion Referenced Tests, Error of Measurement

Interpreting Standardized Test Scores.

Download full text

Perry, Dallis – 1971

Principles of test administration, test validity, and accuracy of measurement underlying interpretation of standardized test scores in educational administration, instruction, and guidance are presented. Types of norm-referenced score transformations, including percentiles, standard scores, and grade equivalents, and of criterion referenced…

Descriptors: Criterion Referenced Tests, Error of Measurement, Evaluation, Expectancy Tables