NotesFAQContact Us
Collection
Advanced
Search Tips
Source
Language Testing120
Audience
Laws, Policies, & Programs
What Works Clearinghouse Rating
Showing 91 to 105 of 120 results Save | Export
Peer reviewed Peer reviewed
Laufer, Batia; Nation, Paul – Language Testing, 1999
Investigated the reliability, validity, and practicality of a controlled production measure of vocabulary, consisting of items from five frequency levels and using a completion-item format. Two equivalent test forms were compared. The test was found to be useful in distinguishing between different proficiency groups. (Author/MSE)
Descriptors: Difficulty Level, Language Tests, Second Languages, Test Construction
Peer reviewed Peer reviewed
Raatz, Ulrich – Language Testing, 1985
Argues that classical test theory cannot be used at the item level on "authentic" language tests. However, if the total score is derived by adding the scores of a number of different and independent parts, test reliability can be estimated. Suggests using the Classical Latent Additives model to examine test-part homogeneity. (Author/SED)
Descriptors: Item Analysis, Latent Trait Theory, Models, Second Language Learning
Peer reviewed Peer reviewed
Krzanowski, Wojtek J.; Woods, Anthony J. – Language Testing, 1984
Discusses the concept of reliability in language testing and considers several simple ANOVA (analysis of variance) models which can be used to define and estimate reliability coefficients. Summarizes the main statistical results associated with the commonly used measurements of reliability. Presents results likely to be of use to language testers.…
Descriptors: Analysis of Variance, Language Skills, Language Tests, Sampling
Peer reviewed Peer reviewed
Brown, James Dean – Language Testing, 1990
Presents simplified methods for deriving estimates of the consistency of criterion-referenced, English-as-a-Second-Language tests, including (1) the threshold loss agreement approach using agreement or kappa coefficients, (2) the squared-error loss agreement approach using the phi(lambda) dependability approach, and (3) the domain score…
Descriptors: Criterion Referenced Tests, English (Second Language), Language Tests, Second Language Learning
Peer reviewed Peer reviewed
Schoonen, Rob; And Others – Language Testing, 1997
Reports on three studies conducted in the Netherlands about the reading reliability of lay and expert readers in rating content and language usage of students' writing performances in three kinds of writing assignments. Findings reveal that expert readers are more reliable in rating usage, whereas both lay and expert readers are reliable raters of…
Descriptors: Foreign Countries, Interrater Reliability, Language Usage, Models
Peer reviewed Peer reviewed
Brown, Annie – Language Testing, 2003
Examines the question of variation among interviewers of oral language proficiency interviews in the ways that they elicit demonstrations of communicative ability and the impact of this variation on candidate performance and raters' perceptions of candidate ability. A discourse analysis of two interviews involving the same candidate with two…
Descriptors: Discourse Analysis, Interrater Reliability, Interviews, Language Proficiency
Peer reviewed Peer reviewed
Allan, Alistair – Language Testing, 1992
The design of a valid and reliable test of test-wiseness is reported: a 33-item multiple-choice instrument with 4 subscales trialed with several groups of English-as-a-Second-Language students. Findings indicate differential skills in test-taking; some learner scores are influenced by skills that are not the focus of the test. (13 references)…
Descriptors: English (Second Language), Language Research, Language Tests, Multiple Choice Tests
Peer reviewed Peer reviewed
Henning, Grant – Language Testing, 1996
Analyzes simulated performance ratings on a six-point scale by two independent raters to account for nonsystematic error in performance ratings. Results suggest that rater agreement or covariance is not always a dependable estimate of score reliability and that the practice of seeking additional raters for adjudication of discrepant ratings is not…
Descriptors: Correlation, Error Patterns, Interrater Reliability, Language Tests
Peer reviewed Peer reviewed
Direct linkDirect link
McCarthy, Philip M.; Jarvis, Scott – Language Testing, 2007
A reliable index of lexical diversity (LD) has remained stubbornly elusive for over 60 years. Meanwhile, researchers in fields as varied as "stylistics," "neuropathology," "language acquisition," and even "forensics" continue to use flawed LD indices--often ignorant that their results are questionable and in…
Descriptors: Second Language Learning, English (Second Language), Foreign Countries, Adolescents
Peer reviewed Peer reviewed
Direct linkDirect link
Liu, Jianda – Language Testing, 2007
Pragmatic proficiency has been incorporated in the EFL teaching and testing syllabi in China, but the corresponding tests still focus on linguistic competence. The gap between the teaching and testing is mainly due to the lack of generally accepted measures of communicative abilities such as pragmatic competence. This study developed a…
Descriptors: Linguistic Competence, Speech Acts, Testing, Foreign Countries
Peer reviewed Peer reviewed
Direct linkDirect link
Elder, Catherine; Barkhuizen, Gary; Knoch, Ute; von Randow, Janet – Language Testing, 2007
The use of online rater self-training is growing in popularity and has obvious practical benefits, facilitating access to training materials and rating samples and allowing raters to reorient themselves to the rating scale and self monitor their behaviour at their own convenience. However there has thus far been little research into rater…
Descriptors: Writing Evaluation, Writing Tests, Scoring Rubrics, Rating Scales
Peer reviewed Peer reviewed
Direct linkDirect link
Lee, Yong-Won – Language Testing, 2006
A multitask speaking measure consisting of both integrated and independent tasks is expected to be an important component of a new version of the TOEFL test. This study considered two critical issues concerning score dependability of the new speaking measure: How much would the score dependability be impacted by (1) combining scores on different…
Descriptors: Language Tests, Second Language Learning, English (Second Language), Generalizability Theory
Peer reviewed Peer reviewed
Perkins, Kyle; Miller, Leah D. – Language Testing, 1984
Describes a study which submitted data from a multiple-choice English as a second language reading comprehension test to classical test theory item analysis and latent trait measurement. The purpose was to identify weak items and to compare the number of weak items indicated by the two different approaches. (SED)
Descriptors: English (Second Language), Language Tests, Latent Trait Theory, Reading Comprehension
Peer reviewed Peer reviewed
Direct linkDirect link
Kozaki ,Y. – Language Testing, 2004
This article presents a standard-setting procedure for performance assessment in a foreign language, through which some of the major problems facing performance assessment in criterion-referenced testing can be addressed. The procedure, which was geared to revealing and accommodating inter-judge variability, employed the synergy of multiple…
Descriptors: Data Analysis, Testing, Performance Tests, Generalizability Theory
Peer reviewed Peer reviewed
Shohamy, Elana – Language Testing, 1997
Argues that language tests employing methods not fair to all test takers are unethical. Ways of reducing sources of unfairness in language testing is discussed. (15 references) (Author/CK)
Descriptors: Academic Achievement, Change Strategies, Ethics, Language Proficiency
Pages: 1  |  2  |  3  |  4  |  5  |  6  |  7  |  8