Publication Date
In 2025 | 0 |
Since 2024 | 0 |
Since 2021 (last 5 years) | 0 |
Since 2016 (last 10 years) | 1 |
Since 2006 (last 20 years) | 5 |
Descriptor
Essay Tests | 17 |
Higher Education | 10 |
Scoring | 7 |
Essays | 6 |
Test Reliability | 6 |
Test Validity | 6 |
Writing Evaluation | 5 |
College Entrance Examinations | 4 |
Correlation | 4 |
Multiple Choice Tests | 4 |
Reliability | 4 |
More ▼ |
Source
Educational and Psychological… | 22 |
Author
Attali, Yigal | 2 |
Michael, William B. | 2 |
Abedi, Jamal | 1 |
Aiken, Lewis R. | 1 |
Anderson, Judy | 1 |
Ansari, Z. A. | 1 |
Ardison, Sharon | 1 |
Bajtelsmit, John W. | 1 |
Baker, Eva L. | 1 |
Batchelder, William H. | 1 |
Boodoo, Gwyneth M. | 1 |
More ▼ |
Publication Type
Journal Articles | 19 |
Reports - Research | 15 |
Reports - Evaluative | 3 |
Book/Product Reviews | 1 |
Reports - Descriptive | 1 |
Education Level
Grade 10 | 1 |
Grade 12 | 1 |
Grade 4 | 1 |
Grade 6 | 1 |
Grade 8 | 1 |
Higher Education | 1 |
Postsecondary Education | 1 |
Audience
Laws, Policies, & Programs
Assessments and Surveys
Medical College Admission Test | 2 |
Test of Standard Written… | 2 |
College Board Achievement… | 1 |
What Works Clearinghouse Rating
LaVoie, Noelle; Parker, James; Legree, Peter J.; Ardison, Sharon; Kilcullen, Robert N. – Educational and Psychological Measurement, 2020
Automated scoring based on Latent Semantic Analysis (LSA) has been successfully used to score essays and constrained short answer responses. Scoring tests that capture open-ended, short answer responses poses some challenges for machine learning approaches. We used LSA techniques to score short answer responses to the Consequences Test, a measure…
Descriptors: Semantics, Evaluators, Essays, Scoring
France, Stephen L.; Batchelder, William H. – Educational and Psychological Measurement, 2015
Cultural consensus theory (CCT) is a data aggregation technique with many applications in the social and behavioral sciences. We describe the intuition and theory behind a set of CCT models for continuous type data using maximum likelihood inference methodology. We describe how bias parameters can be incorporated into these models. We introduce…
Descriptors: Maximum Likelihood Statistics, Test Items, Difficulty Level, Test Theory
Attali, Yigal – Educational and Psychological Measurement, 2011
Contrary to previous research on sequential ratings of student performance, this study found that professional essay raters of a large-scale standardized testing program produced ratings that were drawn toward previous ratings, creating an assimilation effect. Longer intervals between the two adjacent ratings and higher degree of agreement with…
Descriptors: Essay Tests, Standardized Tests, Sequential Approach, Test Bias
Paulhus, Delroy L.; Dubois, Patrick J. – Educational and Psychological Measurement, 2014
The overclaiming technique is a novel assessment procedure that uses signal detection analysis to generate indices of knowledge accuracy (OC-accuracy) and self-enhancement (OC-bias). The technique has previously shown robustness over varied knowledge domains as well as low reactivity across administration contexts. Here we compared the OC-accuracy…
Descriptors: Educational Assessment, Knowledge Level, Accuracy, Cognitive Ability
Attali, Yigal; Powers, Donald – Educational and Psychological Measurement, 2009
A developmental writing scale for timed essay-writing performance was created on the basis of automatically computed indicators of writing fluency, word choice, and conventions of standard written English. In a large-scale data collection effort that involved a national sample of more than 12,000 students from 4th, 6th, 8th, 10th, and 12th grade,…
Descriptors: Validity, Measures (Individuals), Scoring, Essays

Hughes, David C.; And Others – Educational and Psychological Measurement, 1983
A number of studies have found that essays are scored higher when preceded by poor quality scripts than when preceded by good quality scripts. This study investigated the effects of scoring procedures designed to reduce the influence of context. Context effects were found irrespective of the scoring procedure used. (Author/PN)
Descriptors: Context Effect, Essay Tests, Essays, High Schools

Riedel, James A.; Dodson, Janet D. – Educational and Psychological Measurement, 1977
GURU is a computer program developed to analyze data generated by open-ended question techniques such as ECHO or other semistructured data collection techniques in which data are categorized. The program provides extensive descriptive statistics and allows extensive flexibility in comparing data. (Author/JKS)
Descriptors: Computer Programs, Data Analysis, Essay Tests, Test Interpretation

Mitchell, Karen; Anderson, Judy – Educational and Psychological Measurement, 1986
This study examined the reliability of holistic scoring for a sample of essays written during the Spring 1985 MCAT administration. Analysis of variance techniques was used to estimate the reliability of scoring and to partition score variance into that due to level differences between papers and to context-specific factors. (Author/LMO)
Descriptors: Analysis of Variance, Essay Tests, Holistic Evaluation, Medical Education

Boodoo, Gwyneth M.; Garlinghouse, Patricia – Educational and Psychological Measurement, 1983
Three essay questions were administered to junior education major college students. Factor analysis of the ratings showed that content played a large role in the students' responses, yielding dominant first order factors. Generalizability theory, used to examine the reliability of students' ratings, showed the need for more raters and questions.…
Descriptors: Education Majors, Essay Tests, Factor Analysis, Generalizability Theory

Linn, Robert L.; And Others – Educational and Psychological Measurement, 1972
An investigation of certain chance and systematic factors affecting the grades assigned by 17 law school professors to the answers of 79 law students to a typical essay question. (Authors/MB)
Descriptors: Content Analysis, Correlation, Essay Tests, Factor Analysis

Bajtelsmit, John W. – Educational and Psychological Measurement, 1979
A validational procedure was used, which involved a matrix of intercorrelations among tests reresenting four areas of Chartered Life Underwriter content knowledge, each measured by objective multiple-choice and essay methods. Results indicated that the two methods of measuring the same trait yielded fairly consistent estimates of content…
Descriptors: Essay Tests, Higher Education, Insurance Occupations, Multiple Choice Tests

Shermis, Mark D.; Koch, Chantal Mees; Page, Ellis B.; Keith, Timothy Z.; Harrington, Susanmarie – Educational and Psychological Measurement, 2002
Studied the use of an automated grader to score essays holistically and by rating traits through two experiments that evaluated 807 Web-based essays and then compared 386 essays to evaluations by 6 human raters. Results show the essay grading software to be efficient and able to grade about six documents a second. (SLD)
Descriptors: Automation, College Students, Computer Software, Essays

And Others; Michael, William B. – Educational and Psychological Measurement, 1980
Ratings of student performance for two essay questions rendered by professors of English and by professors in other disciplines were compared for reliability and concurrent validity. It was concluded that the reliability and validity of the ratings of the two groups were nearly comparable. (Author/BW)
Descriptors: College Faculty, English Instruction, Essay Tests, Higher Education

Werts, C. E.; And Others – Educational and Psychological Measurement, 1980
Test-retest correlations can lead to biased reliability estimates when there is instability of true scores and/or when measurement errors are correlated. Using three administrations of the Test of Standard Written English and essay ratings, an analysis is demonstrated which separates true score instability and correlated errors. (Author/BW)
Descriptors: College Freshmen, Error of Measurement, Essay Tests, Higher Education

Rentz, R. Robert – Educational and Psychological Measurement, 1980
This paper elaborates on the work of Cardinet, and others, by clarifying some points regarding calculations, specifically with reference to existing computer programs, and by presenting illustrative examples of the calculation and interpretation of several generalizability coefficients from a complex six-facet (factor) design. (Author/RL)
Descriptors: Analysis of Variance, Computation, Computer Programs, Error of Measurement
Previous Page | Next Page ยป
Pages: 1 | 2