Publication Date
In 2025 | 0 |
Since 2024 | 2 |
Since 2021 (last 5 years) | 4 |
Since 2016 (last 10 years) | 11 |
Since 2006 (last 20 years) | 16 |
Descriptor
Language Tests | 21 |
Monte Carlo Methods | 21 |
Second Language Learning | 12 |
English (Second Language) | 9 |
Markov Processes | 8 |
Error of Measurement | 7 |
Foreign Countries | 6 |
Item Response Theory | 6 |
Second Language Instruction | 6 |
Simulation | 6 |
Test Items | 6 |
More ▼ |
Source
Author
Publication Type
Reports - Research | 20 |
Journal Articles | 17 |
Speeches/Meeting Papers | 3 |
Tests/Questionnaires | 2 |
Reports - Descriptive | 1 |
Education Level
Grade 2 | 3 |
Higher Education | 3 |
Postsecondary Education | 3 |
Early Childhood Education | 2 |
Elementary Education | 2 |
Grade 1 | 2 |
Grade 3 | 2 |
Primary Education | 2 |
Adult Education | 1 |
Secondary Education | 1 |
Audience
Researchers | 1 |
Location
Germany | 1 |
Hong Kong | 1 |
Iran | 1 |
South Korea | 1 |
Taiwan | 1 |
Laws, Policies, & Programs
Assessments and Surveys
Test of English as a Foreign… | 4 |
International English… | 2 |
ACT Assessment | 1 |
Clinical Evaluation of… | 1 |
Iowa Tests of Basic Skills | 1 |
MacArthur Communicative… | 1 |
Program for International… | 1 |
Test of English for… | 1 |
What Works Clearinghouse Rating
Joshua B. Gilbert; James S. Kim; Luke W. Miratrix – Annenberg Institute for School Reform at Brown University, 2024
Longitudinal models of individual growth typically emphasize between-person predictors of change but ignore how growth may vary "within" persons because each person contributes only one point at each time to the model. In contrast, modeling growth with multi-item assessments allows evaluation of how relative item performance may shift…
Descriptors: Vocabulary Development, Item Response Theory, Test Items, Student Development
Joshua B. Gilbert; James S. Kim; Luke W. Miratrix – Applied Measurement in Education, 2024
Longitudinal models typically emphasize between-person predictors of change but ignore how growth varies "within" persons because each person contributes only one data point at each time. In contrast, modeling growth with multi-item assessments allows evaluation of how relative item performance may shift over time. While traditionally…
Descriptors: Vocabulary Development, Item Response Theory, Test Items, Student Development
Eckes, Thomas; Jin, Kuan-Yu – International Journal of Testing, 2021
Severity and centrality are two main kinds of rater effects posing threats to the validity and fairness of performance assessments. Adopting Jin and Wang's (2018) extended facets modeling approach, we separately estimated the magnitude of rater severity and centrality effects in the web-based TestDaF (Test of German as a Foreign Language) writing…
Descriptors: Language Tests, German, Second Languages, Writing Tests
Azhar, Aqil Zainal; Segal, Avi; Gal, Kobi – International Educational Data Mining Society, 2022
This paper studies the use of Reinforcement Learning (RL) policies for optimizing the sequencing of online learning materials to students. Our approach provides an end to end pipeline for automatically deriving and evaluating robust representations of students' interactions and policies for content sequencing in online educational settings. We…
Descriptors: Reinforcement, Instructional Materials, Learning Analytics, Policy Analysis
Baghaei, Samira; Bagheri, Mohammad Sadegh; Yamini, Mortaza – Cogent Education, 2020
The main purpose of this quantitative-qualitative content analysis study was to compare IELTS and TOEFL listening and reading tests based on the representation of the learning objectives of Revised Bloom's taxonomy. To this end, 12 Academic IELTS listening and reading tests and 12 TOEFL iBT listening and reading tests were analyzed qualitatively…
Descriptors: Second Language Learning, English (Second Language), Language Tests, Reading Tests
Lin, Chih-Kai – Language Testing, 2017
Sparse-rated data are common in operational performance-based language tests, as an inevitable result of assigning examinee responses to a fraction of available raters. The current study investigates the precision of two generalizability-theory methods (i.e., the rating method and the subdividing method) specifically designed to accommodate the…
Descriptors: Data Analysis, Language Tests, Generalizability Theory, Accuracy
Dashti, Laleh; Razmjoo, Seyyed Ayatollah – Cogent Education, 2020
The purpose of this mixed-methods study was to explore Iranian IELTS candidates' strengths and weaknesses in IELTS Speaking Test in terms of IELTS's four speaking assessment criteria, namely Fluency and Coherence (FlC), Lexical Resource (LR), Grammar Range and Accuracy (GRA), and Pronunciation (Pro). It also aimed to examine the discourse features…
Descriptors: English (Second Language), Second Language Learning, Language Tests, Speech Communication
Morgan, Grant B.; Zhu, Min; Johnson, Robert L.; Hodge, Kari J. – Language Assessment Quarterly, 2014
Common estimators of interrater reliability include Pearson product-moment correlation coefficients, Spearman rank-order correlations, and the generalizability coefficient. The purpose of this study was to examine the accuracy of estimators of interrater reliability when varying the true reliability, number of scale categories, and number of…
Descriptors: Interrater Reliability, Correlation, Generalization, Scoring
Wu, Mike; Davis, Richard L.; Domingue, Benjamin W.; Piech, Chris; Goodman, Noah – International Educational Data Mining Society, 2020
Item Response Theory (IRT) is a ubiquitous model for understanding humans based on their responses to questions, used in fields as diverse as education, medicine and psychology. Large modern datasets offer opportunities to capture more nuances in human behavior, potentially improving test scoring and better informing public policy. Yet larger…
Descriptors: Item Response Theory, Accuracy, Data Analysis, Public Policy
Lowie, Wander; van Dijk, Marijn; Chan, Huiping; Verspoor, Marjolijn – Studies in Second Language Learning and Teaching, 2017
A large body studies into individual differences in second language learning has shown that success in second language learning is strongly affected by a set of relevant learner characteristics ranging from the age of onset to motivation, aptitude, and personality. Most studies have concentrated on a limited number of learner characteristics and…
Descriptors: Second Language Learning, Individual Differences, Learning Motivation, Personality Traits
Enkin, Elizabeth – Canadian Journal of Applied Linguistics / Revue canadienne de linguistique appliquée, 2016
The maze task is a psycholinguistic experimental procedure that measures real-time incremental sentence processing. The task has recently been tested as a language learning tool with promising results. Therefore, the present study examines the merits of a contextualized version of this task: the story maze. The findings are consistent with…
Descriptors: Task Analysis, Psycholinguistics, English, Spanish
Kim, Ah-Young – Language Testing, 2015
Previous research in cognitive diagnostic assessment (CDA) of L2 reading ability has been frequently conducted using large-scale English proficiency exams (e.g., TOEFL, MELAB). Using CDA, it is possible to analyze individual learners' strengths and weaknesses in multiple attributes (i.e., knowledge, skill, strategy) measured at the item level.…
Descriptors: Language Tests, Diagnostic Tests, Cognitive Measurement, Reading Ability
Xie, Qin – Educational Psychology, 2017
The study utilised a fine-grained diagnostic checklist to assess first-year undergraduates in Hong Kong and evaluated its validity and usefulness for diagnosing academic writing in English. Ten English language instructors marked 472 academic essays with the checklist. They also agreed on a Q-matrix, which specified the relationships among the…
Descriptors: Academic Discourse, College Students, College English, Foreign Countries
In'nami, Yo; Koizumi, Rie – International Journal of Testing, 2013
The importance of sample size, although widely discussed in the literature on structural equation modeling (SEM), has not been widely recognized among applied SEM researchers. To narrow this gap, we focus on second language testing and learning studies and examine the following: (a) Is the sample size sufficient in terms of precision and power of…
Descriptors: Structural Equation Models, Sample Size, Second Language Instruction, Monte Carlo Methods
Terry, J. Michael; Jackson, Sandra C.; Evangelou, Evangelos; Smith, Richard L. – Topics in Language Disorders, 2010
This study tests the extent to which giving credit for African American English (AAE) responses on a General American English sentence imitation test mitigates dialect effects. Forty-eight AAE-speaking second graders completed the Recalling Sentences subtest of the Clinical Evaluation of Language Fundamentals-Third Edition (1995). A Bayesian…
Descriptors: Sentences, Black Dialects, Markov Processes, Syntax
Previous Page | Next Page »
Pages: 1 | 2