ERIC - Search Results

Publication Date

In 2025	0
Since 2024	1
Since 2021 (last 5 years)	3
Since 2016 (last 10 years)	5
Since 2006 (last 20 years)	12

Descriptor

Error of Measurement	12
Language Tests	8
Foreign Countries	5
English (Second Language)	4
Scores	4
Second Language Learning	4
Factor Analysis	3
Generalizability Theory	3
Goodness of Fit	3
Interrater Reliability	3
Language Proficiency	3
Language Skills	3
Reliability	3
Accuracy	2
Comparative Analysis	2
Computation	2
Correlation	2
Data Analysis	2
English Language Learners	2
Evaluators	2
Item Response Theory	2
Longitudinal Studies	2
Native Speakers	2
Rating Scales	2
Reading Tests	2
More ▼

Source

Language Testing

Publication Type

Journal Articles	12
Reports - Research	10
Reports - Descriptive	1
Reports - Evaluative	1
Tests/Questionnaires	1

Education Level

Higher Education	3
Postsecondary Education	3
Elementary Education	1
Elementary Secondary Education	1
Grade 5	1
Intermediate Grades	1
Secondary Education	1

Audience

Location

China	1
Finland	1
Germany	1
Japan	1
Netherlands	1

Laws, Policies, & Programs

Assessments and Surveys

Early Childhood Longitudinal…

What Works Clearinghouse Rating

Showing all 12 results Save | Export

All Types of Experience Are Equal, but Some Are More Equal: The Effect of Different Types of Experience on Rater Severity and Rater Consistency

Peer reviewed

Direct link

Reeta Neittaanmäki; Iasonas Lamprianou – Language Testing, 2024

This article focuses on rater severity and consistency and their relation to different types of rater experience over a long period of time. The article is based on longitudinal data collected from 2009 to 2019 from the second language Finnish speaking subtest in the National Certificates of Language Proficiency in Finland. The study investigated…

Descriptors: Foreign Countries, Interrater Reliability, Error of Measurement, Experience

A Meta-Analysis of Self-Assessment and Language Performance in Language Testing and Assessment

Peer reviewed

Direct link

Li, Minzi; Zhang, Xian – Language Testing, 2021

This meta-analysis explores the correlation between self-assessment (SA) and language performance. Sixty-seven studies with 97 independent samples involving more than 68,500 participants were included in our analysis. It was found that the overall correlation between SA and language performance was 0.466 (p < 0.01). Moderator analysis was…

Descriptors: Meta Analysis, Self Evaluation (Individuals), Likert Scales, Research Reports

Measuring the Development of General Language Skills in English as a Foreign Language--Longitudinal Invariance of the C-Test

Peer reviewed

Direct link

Schnoor, Birger; Hartig, Johannes; Klinger, Thorsten; Naumann, Alexander; Usanova, Irina – Language Testing, 2023

Research on assessing English as a foreign language (EFL) development has been growing recently. However, empirical evidence from longitudinal analyses based on substantial samples is still needed. In such settings, tests for measuring language development must meet high standards of test quality such as validity, reliability, and objectivity, as…

Descriptors: English (Second Language), Second Language Learning, Second Language Instruction, Longitudinal Studies

Working with Sparse Data in Rated Language Tests: Generalizability Theory Applications

Peer reviewed

Direct link

Lin, Chih-Kai – Language Testing, 2017

Sparse-rated data are common in operational performance-based language tests, as an inevitable result of assigning examinee responses to a fraction of available raters. The current study investigates the precision of two generalizability-theory methods (i.e., the rating method and the subdividing method) specifically designed to accommodate the…

Descriptors: Data Analysis, Language Tests, Generalizability Theory, Accuracy

Equating in Small-Scale Language Testing Programs

Peer reviewed

Direct link

LaFlair, Geoffrey T.; Isbell, Daniel; May, L. D. Nicolas; Gutierrez Arvizu, Maria Nelly; Jamieson, Joan – Language Testing, 2017

Language programs need multiple test forms for secure administrations and effective placement decisions, but can they have confidence that scores on alternate test forms have the same meaning? In large-scale testing programs, various equating methods are available to ensure the comparability of forms. The choice of equating method is informed by…

Descriptors: Language Tests, Equated Scores, Testing Programs, Comparative Analysis

Investigating Correspondence between Language Proficiency Standards and Academic Content Standards: A Generalizability Theory Study

Peer reviewed

Direct link

Lin, Chih-Kai; Zhang, Jinming – Language Testing, 2014

Research on the relationship between English language proficiency standards and academic content standards serves to provide information about the extent to which English language learners (ELLs) are expected to encounter academic language use that facilitates their content learning, such as in mathematics and science. Standards-to-standards…

Descriptors: Language Proficiency, Academic Standards, Generalizability Theory, English Language Learners

Determining the Scoring Validity of a Co-Constructed CEFR-Based Rating Scale

Peer reviewed

Direct link

Deygers, Bart; Van Gorp, Koen – Language Testing, 2015

Considering scoring validity as encompassing both reliable rating scale use and valid descriptor interpretation, this study reports on the validation of a CEFR-based scale that was co-constructed and used by novice raters. The research questions this paper wishes to answer are (a) whether it is possible to construct a CEFR-based rating scale with…

Descriptors: Rating Scales, Scoring, Validity, Interrater Reliability

Applying Unidimensional and Multidimensional Item Response Theory Models in Testlet-Based Reading Assessment

Peer reviewed

Direct link

Min, Shangchao; He, Lianzhen – Language Testing, 2014

This study examined the relative effectiveness of the multidimensional bi-factor model and multidimensional testlet response theory (TRT) model in accommodating local dependence in testlet-based reading assessment with both dichotomously and polytomously scored items. The data used were 14,089 test-takers' item-level responses to the testlet-based…

Descriptors: Foreign Countries, Item Response Theory, Reading Tests, Test Items

Self-Assessment of Japanese as a Second Language: The Role of Experiences in the Naturalistic Acquisition

Peer reviewed

Direct link

Suzuki, Yuichi – Language Testing, 2015

Self-assessment has been used to assess second language proficiency; however, as sources of measurement errors vary, they may threaten the validity and reliability of the tools. The present paper investigated the role of experiences in using Japanese as a second language in the naturalistic acquisition context on the accuracy of the…

Descriptors: Self Evaluation (Individuals), Error of Measurement, Japanese, Second Language Learning

Principles of Quantile Regression and an Application

Peer reviewed

Direct link

Chen, Fang; Chalhoub-Deville, Micheline – Language Testing, 2014

Newer statistical procedures are typically introduced to help address the limitations of those already in practice or to deal with emerging research needs. Quantile regression (QR) is introduced in this paper as a relatively new methodology, which is intended to overcome some of the limitations of least squares mean regression (LMR). QR is more…

Descriptors: Regression (Statistics), Language Tests, Language Proficiency, Mathematics Achievement

Factor Structure of the Revised TOEIC[R] Test: A Multiple-Sample Analysis

Peer reviewed

Direct link

In'nami, Yo; Koizumi, Rie – Language Testing, 2012

This study examined the factor structure of the listening and reading sections of the revised Test of English for International Communication (TOEIC[R]) test. The data from the TOEIC IP (institutional program) test taken by 569 English learners were randomly split into two samples (n = 285 vs. 284). Four models (higher-order, correlated,…

Descriptors: Communication (Thought Transfer), Second Language Learning, Factor Structure, Measurement

Construct Validation of Analytic Rating Scales in a Speaking Assessment: Reporting a Score Profile and a Composite

Peer reviewed

Direct link

Sawaki, Yasuyo – Language Testing, 2007

This is a construct validation study of a second language speaking assessment that reported a language profile based on analytic rating scales and a composite score. The study addressed three key issues: score dependability, convergent/discriminant validity of analytic rating scales and the weighting of analytic ratings in the composite score.…

Descriptors: Generalizability Theory, Speech Communication, Student Placement, Construct Validity

Lin, Chih-Kai	2
Chalhoub-Deville, Micheline	1
Chen, Fang	1
Deygers, Bart	1
Gutierrez Arvizu, Maria Nelly	1
Hartig, Johannes	1
He, Lianzhen	1
Iasonas Lamprianou	1
In'nami, Yo	1
Isbell, Daniel	1
Jamieson, Joan	1
Klinger, Thorsten	1
Koizumi, Rie	1
LaFlair, Geoffrey T.	1
Li, Minzi	1
May, L. D. Nicolas	1
Min, Shangchao	1
Naumann, Alexander	1
Reeta Neittaanmäki	1
Sawaki, Yasuyo	1
Schnoor, Birger	1
Suzuki, Yuichi	1
Usanova, Irina	1
Van Gorp, Koen	1
Zhang, Jinming	1
More ▼