NotesFAQContact Us
Collection
Advanced
Search Tips
Audience
Researchers1
Location
China1
Turkey1
Laws, Policies, & Programs
What Works Clearinghouse Rating
Showing 1 to 15 of 23 results Save | Export
Peer reviewed Peer reviewed
Direct linkDirect link
Hakyung Sung; Sooyeon Cho; Kristopher Kyle – Language Assessment Quarterly, 2024
Lexical diversity (LD) is an important indicator of second language lexical development. Much research has investigated LD indices, with a focus on learners of English. However, further research is needed in languages that are typologically distinct from English, such as Korean. In this study, we evaluated the reliability and validity of LD…
Descriptors: Second Language Learning, Korean, Persuasive Discourse, Language Tests
Peer reviewed Peer reviewed
Direct linkDirect link
Ying Xu; Xiaodong Li; Jin Chen – Language Testing, 2025
This article provides a detailed review of the Computer-based English Listening Speaking Test (CELST) used in Guangdong, China, as part of the National Matriculation English Test (NMET) to assess students' English proficiency. The CELST measures listening and speaking skills as outlined in the "English Curriculum for Senior Middle…
Descriptors: Computer Assisted Testing, English (Second Language), Language Tests, Listening Comprehension Tests
Peer reviewed Peer reviewed
Direct linkDirect link
Ozdemir, Burhanettin; Gelbal, Selahattin – Education and Information Technologies, 2022
The computerized adaptive tests (CAT) apply an adaptive process in which the items are tailored to individuals' ability scores. The multidimensional CAT (MCAT) designs differ in terms of different item selection, ability estimation, and termination methods being used. This study aims at investigating the performance of the MCAT designs used to…
Descriptors: Scores, Computer Assisted Testing, Test Items, Language Proficiency
Peer reviewed Peer reviewed
Direct linkDirect link
Kelly, William E.; Daughtry, Don – College Student Journal, 2018
This study developed an abbreviated form of Barron's (1953) Ego Strength Scale for use in research among college student samples. A version of Barron's scale was administered to 100 undergraduate college students. Using item-total score correlations and internal consistency, the scale was reduced to 18 items (Es18). The Es18 possessed adequate…
Descriptors: Undergraduate Students, Self Concept Measures, Test Length, Scores
Peer reviewed Peer reviewed
Direct linkDirect link
Norris, John; Drackert, Anastasia – Language Testing, 2018
The Test of German as a Foreign Language (TestDaF) plays a critical role as a standardized test of German language proficiency. Developed and administered by the Society for Academic Study Preparation and Test Development (g.a.s.t.), TestDaF was launched in 2001 and has experienced persistent annual growth, with more than 44,000 test takers in…
Descriptors: German, Second Language Learning, Language Tests, Language Proficiency
Peer reviewed Peer reviewed
Direct linkDirect link
Isbell, Dan; Winke, Paula – Language Testing, 2019
The American Council on the Teaching of Foreign Languages (ACTFL) oral proficiency interview -- computer (OPIc) testing system represents an ambitious effort in language assessment: Assessing oral proficiency in over a dozen languages, on the same scale, from virtually anywhere at any time. Especially for users in contexts where multiple foreign…
Descriptors: Oral Language, Language Tests, Language Proficiency, Second Language Learning
Peer reviewed Peer reviewed
Direct linkDirect link
Senel, Selma; Kutlu, Ömer – European Journal of Special Needs Education, 2018
This paper examines listening comprehension skills of visually impaired students (VIS) using computerised adaptive testing (CAT) and reader-assisted paper-pencil testing (raPPT) and student views about them. Explanatory mixed method design was used in this study. Sample is comprised of 51 VIS, in 7th and 8th grades. 9 of these students were…
Descriptors: Computer Assisted Testing, Adaptive Testing, Visual Impairments, Student Attitudes
Peer reviewed Peer reviewed
Direct linkDirect link
Lee, Yi-Hsuan; Zhang, Jinming – International Journal of Testing, 2017
Simulations were conducted to examine the effect of differential item functioning (DIF) on measurement consequences such as total scores, item response theory (IRT) ability estimates, and test reliability in terms of the ratio of true-score variance to observed-score variance and the standard error of estimation for the IRT ability parameter. The…
Descriptors: Test Bias, Test Reliability, Performance, Scores
Peer reviewed Peer reviewed
Direct linkDirect link
Anthony, Christopher James; DiPerna, James Clyde – School Psychology Quarterly, 2017
The Academic Competence Evaluation Scales-Teacher Form (ACES-TF; DiPerna & Elliott, 2000) was developed to measure student academic skills and enablers (interpersonal skills, engagement, motivation, and study skills). Although ACES-TF scores have demonstrated psychometric adequacy, the length of the measure may be prohibitive for certain…
Descriptors: Test Items, Efficiency, Item Response Theory, Test Length
Peer reviewed Peer reviewed
PDF on ERIC Download full text
Sabatini, J.; O'Reilly, T.; Halderman, L.; Bruce, K. – Grantee Submission, 2014
Existing reading comprehension assessments have been criticized by researchers, educators, and policy makers, especially regarding their coverage, utility, and authenticity. The purpose of the current study was to evaluate a new assessment of reading comprehension that was designed to broaden the construct of reading. In light of these issues, we…
Descriptors: Reading Comprehension, Vignettes, Reading Tests, Elementary School Students
Peer reviewed Peer reviewed
Direct linkDirect link
Yao, Lihua – Applied Psychological Measurement, 2013
Through simulated data, five multidimensional computerized adaptive testing (MCAT) selection procedures with varying test lengths are examined and compared using different stopping rules. Fixed item exposure rates are used for all the items, and the Priority Index (PI) method is used for the content constraints. Two stopping rules, standard error…
Descriptors: Computer Assisted Testing, Adaptive Testing, Test Items, Selection
Peer reviewed Peer reviewed
PDF on ERIC Download full text
Lee, Yi-Hsuan; Zhang, Jinming – ETS Research Report Series, 2010
This report examines the consequences of differential item functioning (DIF) using simulated data. Its impact on total score, item response theory (IRT) ability estimate, and test reliability was evaluated in various testing scenarios created by manipulating the following four factors: test length, percentage of DIF items per form, sample sizes of…
Descriptors: Test Bias, Item Response Theory, Test Items, Scores
Peer reviewed Peer reviewed
PDF on ERIC Download full text
Rotou, Ourania; Patsula, Liane; Steffen, Manfred; Rizavi, Saba – ETS Research Report Series, 2007
Traditionally, the fixed-length linear paper-and-pencil (P&P) mode of administration has been the standard method of test delivery. With the advancement of technology, however, the popularity of administering tests using adaptive methods like computerized adaptive testing (CAT) and multistage testing (MST) has grown in the field of measurement…
Descriptors: Comparative Analysis, Test Format, Computer Assisted Testing, Models
Peer reviewed Peer reviewed
Livingston, Samuel A.; Lewis, Charles – Journal of Educational Measurement, 1995
A method is presented for estimating the accuracy and consistency of classifications based on test scores. The reliability of the score is used to estimate effective test length in terms of discrete items. The true-score distribution is estimated by fitting a four-parameter beta model. (SLD)
Descriptors: Classification, Estimation (Mathematics), Scores, Statistical Distributions
Peer reviewed Peer reviewed
Meijer, Rob R.; And Others – Applied Psychological Measurement, 1994
The power of the nonparametric person-fit statistic, U3, is investigated through simulations as a function of item characteristics, test characteristics, person characteristics, and the group to which examinees belong. Results suggest conditions under which relatively short tests can be used for person-fit analysis. (SLD)
Descriptors: Difficulty Level, Group Membership, Item Response Theory, Nonparametric Statistics
Previous Page | Next Page »
Pages: 1  |  2