ERIC - Search Results

Publication Date

In 2025	1
Since 2024	2
Since 2021 (last 5 years)	3
Since 2016 (last 10 years)	9
Since 2006 (last 20 years)	14

Descriptor

Scores	23
Test Length	23
Test Reliability	23
Test Items	12
Computer Assisted Testing	7
Item Response Theory	7
Test Validity	7
Error of Measurement	6
Language Tests	6
Test Format	6
Comparative Analysis	5
Language Proficiency	5
Second Language Learning	4
Simulation	4
Statistical Analysis	4
Test Construction	4
Elementary School Students	3
Psychometrics	3
Test Results	3
Testing Problems	3
Adaptive Testing	2
Difficulty Level	2
Elementary Education	2
English (Second Language)	2
Estimation (Mathematics)	2
More ▼

Source

Language Testing	3
Applied Psychological…	2
ETS Research Report Series	2
Applied Measurement in…	1
Assessment & Evaluation in…	1
College Student Journal	1
Education and Information…	1
European Journal of Special…	1
Grantee Submission	1
International Journal of…	1
Journal of Educational…	1
Language Assessment Quarterly	1
School Psychology Quarterly	1
School Psychology Review	1
More ▼

Publication Type

Journal Articles	17
Reports - Research	14
Reports - Evaluative	8
Speeches/Meeting Papers	3
Reports - Descriptive	1

Education Level

Elementary Education	2
Middle Schools	2
Grade 6	1
Grade 7	1
Higher Education	1
Intermediate Grades	1
Junior High Schools	1
Postsecondary Education	1
Secondary Education	1

Audience

Researchers

Location

China	1
Turkey	1

Laws, Policies, & Programs

Assessments and Surveys

ACTFL Oral Proficiency…	1
Armed Forces Qualification…	1
Iowa Tests of Basic Skills	1
Matching Familiar Figures Test	1
Peabody Picture Vocabulary…	1
Test of English as a Foreign…	1

What Works Clearinghouse Rating

Showing 1 to 15 of 23 results Save | Export

An Empirical Evaluation of Lexical Diversity Indices in L2 Korean Writing Assessment

Peer reviewed

Direct link

Hakyung Sung; Sooyeon Cho; Kristopher Kyle – Language Assessment Quarterly, 2024

Lexical diversity (LD) is an important indicator of second language lexical development. Much research has investigated LD indices, with a focus on learners of English. However, further research is needed in languages that are typologically distinct from English, such as Korean. In this study, we evaluated the reliability and validity of LD…

Descriptors: Second Language Learning, Korean, Persuasive Discourse, Language Tests

Test Review: Computer-Based English Listening and Speaking Test (CELST) of National Matriculation English Test (NMET) Guangdong Version in China

Peer reviewed

Direct link

Ying Xu; Xiaodong Li; Jin Chen – Language Testing, 2025

This article provides a detailed review of the Computer-based English Listening Speaking Test (CELST) used in Guangdong, China, as part of the National Matriculation English Test (NMET) to assess students' English proficiency. The CELST measures listening and speaking skills as outlined in the "English Curriculum for Senior Middle…

Descriptors: Computer Assisted Testing, English (Second Language), Language Tests, Listening Comprehension Tests

Measuring Language Ability of Students with Compensatory Multidimensional CAT: A Post-Hoc Simulation Study

Peer reviewed

Direct link

Ozdemir, Burhanettin; Gelbal, Selahattin – Education and Information Technologies, 2022

The computerized adaptive tests (CAT) apply an adaptive process in which the items are tailored to individuals' ability scores. The multidimensional CAT (MCAT) designs differ in terms of different item selection, ability estimation, and termination methods being used. This study aims at investigating the performance of the MCAT designs used to…

Descriptors: Scores, Computer Assisted Testing, Test Items, Language Proficiency

A Shorter Short Version of Barron's Ego Strength Scale

Peer reviewed

Direct link

Kelly, William E.; Daughtry, Don – College Student Journal, 2018

This study developed an abbreviated form of Barron's (1953) Ego Strength Scale for use in research among college student samples. A version of Barron's scale was administered to 100 undergraduate college students. Using item-total score correlations and internal consistency, the scale was reduced to 18 items (Es18). The Es18 possessed adequate…

Descriptors: Undergraduate Students, Self Concept Measures, Test Length, Scores

Test Review: TestDaF

Peer reviewed

Direct link

Norris, John; Drackert, Anastasia – Language Testing, 2018

The Test of German as a Foreign Language (TestDaF) plays a critical role as a standardized test of German language proficiency. Developed and administered by the Society for Academic Study Preparation and Test Development (g.a.s.t.), TestDaF was launched in 2001 and has experienced persistent annual growth, with more than 44,000 test takers in…

Descriptors: German, Second Language Learning, Language Tests, Language Proficiency

ACTFL Oral Proficiency Interview -- Computer (OPIc)

Peer reviewed

Direct link

Isbell, Dan; Winke, Paula – Language Testing, 2019

The American Council on the Teaching of Foreign Languages (ACTFL) oral proficiency interview -- computer (OPIc) testing system represents an ambitious effort in language assessment: Assessing oral proficiency in over a dozen languages, on the same scale, from virtually anywhere at any time. Especially for users in contexts where multiple foreign…

Descriptors: Oral Language, Language Tests, Language Proficiency, Second Language Learning

Comparison of Two Test Methods for VIS: Paper-Pencil Test and CAT

Peer reviewed

Direct link

Senel, Selma; Kutlu, Ömer – European Journal of Special Needs Education, 2018

This paper examines listening comprehension skills of visually impaired students (VIS) using computerised adaptive testing (CAT) and reader-assisted paper-pencil testing (raPPT) and student views about them. Explanatory mixed method design was used in this study. Sample is comprised of 51 VIS, in 7th and 8th grades. 9 of these students were…

Descriptors: Computer Assisted Testing, Adaptive Testing, Visual Impairments, Student Attitudes

Effects of Differential Item Functioning on Examinees' Test Performance and Reliability of Test

Peer reviewed

Direct link

Lee, Yi-Hsuan; Zhang, Jinming – International Journal of Testing, 2017

Simulations were conducted to examine the effect of differential item functioning (DIF) on measurement consequences such as total scores, item response theory (IRT) ability estimates, and test reliability in terms of the ratio of true-score variance to observed-score variance and the standard error of estimation for the IRT ability parameter. The…

Descriptors: Test Bias, Test Reliability, Performance, Scores

Identifying Sets of Maximally Efficient Items from the Academic Competence Evaluation Scales-Teacher Form

Peer reviewed

Direct link

Anthony, Christopher James; DiPerna, James Clyde – School Psychology Quarterly, 2017

The Academic Competence Evaluation Scales-Teacher Form (ACES-TF; DiPerna & Elliott, 2000) was developed to measure student academic skills and enablers (interpersonal skills, engagement, motivation, and study skills). Although ACES-TF scores have demonstrated psychometric adequacy, the length of the measure may be prohibitive for certain…

Descriptors: Test Items, Efficiency, Item Response Theory, Test Length

Broadening the Scope of Reading Comprehension Using Scenario-Based Assessments: Preliminary Findings and Challenges

Peer reviewed
PDF on ERIC

Download full text

Sabatini, J.; O'Reilly, T.; Halderman, L.; Bruce, K. – Grantee Submission, 2014

Existing reading comprehension assessments have been criticized by researchers, educators, and policy makers, especially regarding their coverage, utility, and authenticity. The purpose of the current study was to evaluate a new assessment of reading comprehension that was designed to broaden the construct of reading. In light of these issues, we…

Descriptors: Reading Comprehension, Vignettes, Reading Tests, Elementary School Students

Comparing the Performance of Five Multidimensional CAT Selection Procedures with Different Stopping Rules

Peer reviewed

Direct link

Yao, Lihua – Applied Psychological Measurement, 2013

Through simulated data, five multidimensional computerized adaptive testing (MCAT) selection procedures with varying test lengths are examined and compared using different stopping rules. Fixed item exposure rates are used for all the items, and the Priority Index (PI) method is used for the content constraints. Two stopping rules, standard error…

Descriptors: Computer Assisted Testing, Adaptive Testing, Test Items, Selection

Differential Item Functioning: Its Consequences. Research Report. ETS RR-10-01

Peer reviewed
PDF on ERIC

Download full text

Lee, Yi-Hsuan; Zhang, Jinming – ETS Research Report Series, 2010

This report examines the consequences of differential item functioning (DIF) using simulated data. Its impact on total score, item response theory (IRT) ability estimate, and test reliability was evaluated in various testing scenarios created by manipulating the following four factors: test length, percentage of DIF items per form, sample sizes of…

Descriptors: Test Bias, Item Response Theory, Test Items, Scores

Comparison of Multistage Tests with Computerized Adaptive and Paper-and-Pencil Tests. Research Report. ETS RR-07-04

Peer reviewed
PDF on ERIC

Download full text

Rotou, Ourania; Patsula, Liane; Steffen, Manfred; Rizavi, Saba – ETS Research Report Series, 2007

Traditionally, the fixed-length linear paper-and-pencil (P&P) mode of administration has been the standard method of test delivery. With the advancement of technology, however, the popularity of administering tests using adaptive methods like computerized adaptive testing (CAT) and multistage testing (MST) has grown in the field of measurement…

Descriptors: Comparative Analysis, Test Format, Computer Assisted Testing, Models

Estimating the Consistency and Accuracy of Classifications Based on Test Scores.

Peer reviewed

Livingston, Samuel A.; Lewis, Charles – Journal of Educational Measurement, 1995

A method is presented for estimating the accuracy and consistency of classifications based on test scores. The reliability of the score is used to estimate effective test length in terms of discrete items. The true-score distribution is estimated by fitting a four-parameter beta model. (SLD)

Descriptors: Classification, Estimation (Mathematics), Scores, Statistical Distributions

Influence of Test and Person Characteristics on Nonparametric Appropriateness Measurement.

Peer reviewed

Meijer, Rob R.; And Others – Applied Psychological Measurement, 1994

The power of the nonparametric person-fit statistic, U3, is investigated through simulations as a function of item characteristics, test characteristics, person characteristics, and the group to which examinees belong. Results suggest conditions under which relatively short tests can be used for person-fit analysis. (SLD)

Descriptors: Difficulty Level, Group Membership, Item Response Theory, Nonparametric Statistics

Previous Page | Next Page »

Pages: 1 | 2

Lee, Yi-Hsuan	2
Zhang, Jinming	2
Anthony, Christopher James	1
Bruce, K.	1
Burton, Richard F.	1
Daughtry, Don	1
DiPerna, James Clyde	1
Drackert, Anastasia	1
Gelbal, Selahattin	1
Gilmer, Jerry S.	1
Hakyung Sung	1
Halderman, L.	1
Hambleton, Ronald K.	1
Hanson, Dave	1
Henning, Grant	1
Isbell, Dan	1
Jin Chen	1
Kelly, William E.	1
Kipps, Debi	1
Kristopher Kyle	1
Kutlu, Ömer	1
Lenel, Julia C.	1
Lewis, Charles	1
Livingston, Samuel A.	1
More ▼