NotesFAQContact Us
Collection
Advanced
Search Tips
Showing 1 to 15 of 22 results Save | Export
Peer reviewed Peer reviewed
Direct linkDirect link
LaVoie, Noelle; Parker, James; Legree, Peter J.; Ardison, Sharon; Kilcullen, Robert N. – Educational and Psychological Measurement, 2020
Automated scoring based on Latent Semantic Analysis (LSA) has been successfully used to score essays and constrained short answer responses. Scoring tests that capture open-ended, short answer responses poses some challenges for machine learning approaches. We used LSA techniques to score short answer responses to the Consequences Test, a measure…
Descriptors: Semantics, Evaluators, Essays, Scoring
Peer reviewed Peer reviewed
Direct linkDirect link
France, Stephen L.; Batchelder, William H. – Educational and Psychological Measurement, 2015
Cultural consensus theory (CCT) is a data aggregation technique with many applications in the social and behavioral sciences. We describe the intuition and theory behind a set of CCT models for continuous type data using maximum likelihood inference methodology. We describe how bias parameters can be incorporated into these models. We introduce…
Descriptors: Maximum Likelihood Statistics, Test Items, Difficulty Level, Test Theory
Peer reviewed Peer reviewed
Direct linkDirect link
Attali, Yigal – Educational and Psychological Measurement, 2011
Contrary to previous research on sequential ratings of student performance, this study found that professional essay raters of a large-scale standardized testing program produced ratings that were drawn toward previous ratings, creating an assimilation effect. Longer intervals between the two adjacent ratings and higher degree of agreement with…
Descriptors: Essay Tests, Standardized Tests, Sequential Approach, Test Bias
Peer reviewed Peer reviewed
Direct linkDirect link
Paulhus, Delroy L.; Dubois, Patrick J. – Educational and Psychological Measurement, 2014
The overclaiming technique is a novel assessment procedure that uses signal detection analysis to generate indices of knowledge accuracy (OC-accuracy) and self-enhancement (OC-bias). The technique has previously shown robustness over varied knowledge domains as well as low reactivity across administration contexts. Here we compared the OC-accuracy…
Descriptors: Educational Assessment, Knowledge Level, Accuracy, Cognitive Ability
Peer reviewed Peer reviewed
Direct linkDirect link
Attali, Yigal; Powers, Donald – Educational and Psychological Measurement, 2009
A developmental writing scale for timed essay-writing performance was created on the basis of automatically computed indicators of writing fluency, word choice, and conventions of standard written English. In a large-scale data collection effort that involved a national sample of more than 12,000 students from 4th, 6th, 8th, 10th, and 12th grade,…
Descriptors: Validity, Measures (Individuals), Scoring, Essays
Peer reviewed Peer reviewed
Hughes, David C.; And Others – Educational and Psychological Measurement, 1983
A number of studies have found that essays are scored higher when preceded by poor quality scripts than when preceded by good quality scripts. This study investigated the effects of scoring procedures designed to reduce the influence of context. Context effects were found irrespective of the scoring procedure used. (Author/PN)
Descriptors: Context Effect, Essay Tests, Essays, High Schools
Peer reviewed Peer reviewed
Riedel, James A.; Dodson, Janet D. – Educational and Psychological Measurement, 1977
GURU is a computer program developed to analyze data generated by open-ended question techniques such as ECHO or other semistructured data collection techniques in which data are categorized. The program provides extensive descriptive statistics and allows extensive flexibility in comparing data. (Author/JKS)
Descriptors: Computer Programs, Data Analysis, Essay Tests, Test Interpretation
Peer reviewed Peer reviewed
Mitchell, Karen; Anderson, Judy – Educational and Psychological Measurement, 1986
This study examined the reliability of holistic scoring for a sample of essays written during the Spring 1985 MCAT administration. Analysis of variance techniques was used to estimate the reliability of scoring and to partition score variance into that due to level differences between papers and to context-specific factors. (Author/LMO)
Descriptors: Analysis of Variance, Essay Tests, Holistic Evaluation, Medical Education
Peer reviewed Peer reviewed
Boodoo, Gwyneth M.; Garlinghouse, Patricia – Educational and Psychological Measurement, 1983
Three essay questions were administered to junior education major college students. Factor analysis of the ratings showed that content played a large role in the students' responses, yielding dominant first order factors. Generalizability theory, used to examine the reliability of students' ratings, showed the need for more raters and questions.…
Descriptors: Education Majors, Essay Tests, Factor Analysis, Generalizability Theory
Peer reviewed Peer reviewed
Linn, Robert L.; And Others – Educational and Psychological Measurement, 1972
An investigation of certain chance and systematic factors affecting the grades assigned by 17 law school professors to the answers of 79 law students to a typical essay question. (Authors/MB)
Descriptors: Content Analysis, Correlation, Essay Tests, Factor Analysis
Peer reviewed Peer reviewed
Bajtelsmit, John W. – Educational and Psychological Measurement, 1979
A validational procedure was used, which involved a matrix of intercorrelations among tests reresenting four areas of Chartered Life Underwriter content knowledge, each measured by objective multiple-choice and essay methods. Results indicated that the two methods of measuring the same trait yielded fairly consistent estimates of content…
Descriptors: Essay Tests, Higher Education, Insurance Occupations, Multiple Choice Tests
Peer reviewed Peer reviewed
Shermis, Mark D.; Koch, Chantal Mees; Page, Ellis B.; Keith, Timothy Z.; Harrington, Susanmarie – Educational and Psychological Measurement, 2002
Studied the use of an automated grader to score essays holistically and by rating traits through two experiments that evaluated 807 Web-based essays and then compared 386 essays to evaluations by 6 human raters. Results show the essay grading software to be efficient and able to grade about six documents a second. (SLD)
Descriptors: Automation, College Students, Computer Software, Essays
Peer reviewed Peer reviewed
And Others; Michael, William B. – Educational and Psychological Measurement, 1980
Ratings of student performance for two essay questions rendered by professors of English and by professors in other disciplines were compared for reliability and concurrent validity. It was concluded that the reliability and validity of the ratings of the two groups were nearly comparable. (Author/BW)
Descriptors: College Faculty, English Instruction, Essay Tests, Higher Education
Peer reviewed Peer reviewed
Werts, C. E.; And Others – Educational and Psychological Measurement, 1980
Test-retest correlations can lead to biased reliability estimates when there is instability of true scores and/or when measurement errors are correlated. Using three administrations of the Test of Standard Written English and essay ratings, an analysis is demonstrated which separates true score instability and correlated errors. (Author/BW)
Descriptors: College Freshmen, Error of Measurement, Essay Tests, Higher Education
Peer reviewed Peer reviewed
Rentz, R. Robert – Educational and Psychological Measurement, 1980
This paper elaborates on the work of Cardinet, and others, by clarifying some points regarding calculations, specifically with reference to existing computer programs, and by presenting illustrative examples of the calculation and interpretation of several generalizability coefficients from a complex six-facet (factor) design. (Author/RL)
Descriptors: Analysis of Variance, Computation, Computer Programs, Error of Measurement
Previous Page | Next Page ยป
Pages: 1  |  2