ERIC - Search Results

Publication Date

In 2025	0
Since 2024	0
Since 2021 (last 5 years)	2
Since 2016 (last 10 years)	4
Since 2006 (last 20 years)	9

Descriptor

Comparative Analysis	15
Test Length	15
Test Reliability	15
Computer Assisted Testing	8
Test Format	6
Scores	5
Test Validity	5
Adaptive Testing	4
Foreign Countries	4
Item Response Theory	4
Simulation	4
Test Items	4
Correlation	3
Cutting Scores	3
Error of Measurement	3
Higher Education	3
Language Tests	3
Mathematical Models	3
Test Construction	3
Bayesian Statistics	2
Criterion Referenced Tests	2
Difficulty Level	2
Factor Analysis	2
Language Proficiency	2
Latent Trait Theory	2
More ▼

Source

ETS Research Report Series	2
Education and Information…	1
European Journal of Special…	1
Language Testing	1
Measurement in Physical…	1
Online Submission	1
Psychometrika	1
Research Matters	1
Toegepaste taalwetenschap in…	1

Publication Type

Reports - Research	14
Journal Articles	10
Speeches/Meeting Papers	3
Reports - Evaluative	1

Education Level

Elementary Education	1
Elementary Secondary Education	1
Grade 7	1
Secondary Education	1

Audience

Location

Turkey	2
China	1
Singapore	1
United Kingdom	1

Laws, Policies, & Programs

Assessments and Surveys

ACTFL Oral Proficiency…	1
School and College Ability…	1

What Works Clearinghouse Rating

Showing all 15 results Save | Export

Item Response Theory, Computer Adaptive Testing and the Risk of Self-Deception

Download full text

Benton, Tom – Research Matters, 2021

Computer adaptive testing is intended to make assessment more reliable by tailoring the difficulty of the questions a student has to answer to their level of ability. Most commonly, this benefit is used to justify the length of tests being shortened whilst retaining the reliability of a longer, non-adaptive test. Improvements due to adaptive…

Descriptors: Risk, Item Response Theory, Computer Assisted Testing, Difficulty Level

Measuring Language Ability of Students with Compensatory Multidimensional CAT: A Post-Hoc Simulation Study

Peer reviewed

Direct link

Ozdemir, Burhanettin; Gelbal, Selahattin – Education and Information Technologies, 2022

The computerized adaptive tests (CAT) apply an adaptive process in which the items are tailored to individuals' ability scores. The multidimensional CAT (MCAT) designs differ in terms of different item selection, ability estimation, and termination methods being used. This study aims at investigating the performance of the MCAT designs used to…

Descriptors: Scores, Computer Assisted Testing, Test Items, Language Proficiency

ACTFL Oral Proficiency Interview -- Computer (OPIc)

Peer reviewed

Direct link

Isbell, Dan; Winke, Paula – Language Testing, 2019

The American Council on the Teaching of Foreign Languages (ACTFL) oral proficiency interview -- computer (OPIc) testing system represents an ambitious effort in language assessment: Assessing oral proficiency in over a dozen languages, on the same scale, from virtually anywhere at any time. Especially for users in contexts where multiple foreign…

Descriptors: Oral Language, Language Tests, Language Proficiency, Second Language Learning

Comparison of Two Test Methods for VIS: Paper-Pencil Test and CAT

Peer reviewed

Direct link

Senel, Selma; Kutlu, Ömer – European Journal of Special Needs Education, 2018

This paper examines listening comprehension skills of visually impaired students (VIS) using computerised adaptive testing (CAT) and reader-assisted paper-pencil testing (raPPT) and student views about them. Explanatory mixed method design was used in this study. Sample is comprised of 51 VIS, in 7th and 8th grades. 9 of these students were…

Descriptors: Computer Assisted Testing, Adaptive Testing, Visual Impairments, Student Attitudes

Indexing Creativity Fostering Teacher Behaviour: Replication and Modification

Download full text

Dikici, Ayhan; Soh, Kaycheng – Online Submission, 2015

Many measurement tools on creativity are available in the literature. One of these scales is Creativity Fostering Teacher Behaviour Index (CFTIndex) developed for Singaporean teacher originally. It was then translated into Turkish and trialled on teachers in Nigde province with acceptable reliability and factorial validity. The main purpose of…

Descriptors: Creativity, Teacher Behavior, Comparative Analysis, Turkish

The Psychometric Properties of the Short and Long Versions of the Coach-Athlete Relationship Questionnaire

Peer reviewed

Direct link

Yang, Sophie Xin; Jowett, Sophia – Measurement in Physical Education and Exercise Science, 2013

The Coach-Athlete Relationship Questionnaire was developed to effectively measure affective, cognitive, and behavioral aspects, represented by the interpersonal constructs of closeness, commitment, and complementarity, of the quality of the relationship within the context of sport coaching. The current study sought to determine the internal…

Descriptors: Foreign Countries, Athletes, Athletic Coaches, Interpersonal Relationship

Multidimensional CAT Item Selection Methods for Domain Scores and Composite Scores: Theory and Applications

Peer reviewed

Direct link

Yao, Lihua – Psychometrika, 2012

Multidimensional computer adaptive testing (MCAT) can provide higher precision and reliability or reduce test length when compared with unidimensional CAT or with the paper-and-pencil test. This study compared five item selection procedures in the MCAT framework for both domain scores and overall scores through simulation by varying the structure…

Descriptors: Item Banks, Test Length, Simulation, Adaptive Testing

Differential Item Functioning: Its Consequences. Research Report. ETS RR-10-01

Peer reviewed
PDF on ERIC

Download full text

Lee, Yi-Hsuan; Zhang, Jinming – ETS Research Report Series, 2010

This report examines the consequences of differential item functioning (DIF) using simulated data. Its impact on total score, item response theory (IRT) ability estimate, and test reliability was evaluated in various testing scenarios created by manipulating the following four factors: test length, percentage of DIF items per form, sample sizes of…

Descriptors: Test Bias, Item Response Theory, Test Items, Scores

Comparison of Multistage Tests with Computerized Adaptive and Paper-and-Pencil Tests. Research Report. ETS RR-07-04

Peer reviewed
PDF on ERIC

Download full text

Rotou, Ourania; Patsula, Liane; Steffen, Manfred; Rizavi, Saba – ETS Research Report Series, 2007

Traditionally, the fixed-length linear paper-and-pencil (P&P) mode of administration has been the standard method of test delivery. With the advancement of technology, however, the popularity of administering tests using adaptive methods like computerized adaptive testing (CAT) and multistage testing (MST) has grown in the field of measurement…

Descriptors: Comparative Analysis, Test Format, Computer Assisted Testing, Models

The Use of the Sequential Probability Ratio Test in Making Grade Classifications in Conjunction with Tailored Testing.

Download full text

Reckase, Mark D. – 1981

This report describes a study comparing the classification results obtained from a one-parameter and three-parameter logistic based tailored testing procedure used in conjunction with Wald's sequential probability ratio test (SPRT). Eighty-eight college students were classified into four grade categories using achievement test results obtained…

Descriptors: Adaptive Testing, Classification, Comparative Analysis, Computer Assisted Testing

Comparison of Difficulties and Reliabilities of Math-Completion and Multiple-Choice Item Formats.

Download full text

Oosterhof, Albert C.; Coats, Pamela K. – 1981

Instructors who develop classroom examinations that require students to provide a numerical response to a mathematical problem are often very concerned about the appropriateness of the multiple-choice format. The present study augments previous research relevant to this concern by comparing the difficulty and reliability of multiple-choice and…

Descriptors: Comparative Analysis, Difficulty Level, Grading, Higher Education

Effects of Test Length and Advancement Score on Several Criterion-Referenced Test Reliability and Validity Indices. Laboratory of Psychometric and Evaluation Research Report No. 86.

Download full text

Eignor, Daniel R.; Hambleton, Ronald K. – 1979

The purpose of the investigation was to obtain some relationships among (1) test lengths, (2) shape of domain-score distributions, (3) advancement scores, and (4) several criterion-referenced test score reliability and validity indices. The study was conducted using computer simulation methods. The values of variables under study were set to be…

Descriptors: Comparative Analysis, Computer Assisted Testing, Criterion Referenced Tests, Cutting Scores

A Comparison of Reliability Estimates from Single and Double Administrations of Criterion-Referenced Tests.

Schaefer, Mary M.; Gross, Susan K. – 1983

Viewing the reliability for criterion-referenced tests as that of mastery classification decisions, three models for determining reliability were examined using two test administrations so that two estimates could be compared to a standard. A major purpose of the research was to determine how several reliability coefficients (coefficient kappa, an…

Descriptors: Comparative Analysis, Correlation, Criterion Referenced Tests, Cutting Scores

A Comparison of a Bayesian and a Maximum Likelihood Tailored Testing Procedure.

Download full text

McKinley, Robert L.; Reckase, Mark D. – 1981

A study was conducted to compare tailored testing procedures based on a Bayesian ability estimation technique and on a maximum likelihood ability estimation technique. The Bayesian tailored testing procedure selected items so as to minimize the posterior variance of the ability estimate distribution, while the maximum likelihood tailored testing…

Descriptors: Academic Ability, Adaptive Testing, Bayesian Statistics, Comparative Analysis

Listening, a Single Trait in First and Second Language Learning.

Download full text

de Jong, John H. A. L. – Toegepaste taalwetenschap in artikelen 20, 1984

A study investigated the validity of an English listening skills test by comparing the results of native American and British English speakers with those of Dutch students of English as a second language. A hypothesis suggested that two-thirds of the items would test listening skills and the remaining third would test other knowledge. Test results…

Descriptors: Age Differences, Comparative Analysis, Correlation, Educational Background

Reckase, Mark D.	2
Benton, Tom	1
Coats, Pamela K.	1
Dikici, Ayhan	1
Eignor, Daniel R.	1
Gelbal, Selahattin	1
Gross, Susan K.	1
Hambleton, Ronald K.	1
Isbell, Dan	1
Jowett, Sophia	1
Kutlu, Ömer	1
Lee, Yi-Hsuan	1
McKinley, Robert L.	1
Oosterhof, Albert C.	1
Ozdemir, Burhanettin	1
Patsula, Liane	1
Rizavi, Saba	1
Rotou, Ourania	1
Schaefer, Mary M.	1
Senel, Selma	1
Soh, Kaycheng	1
Steffen, Manfred	1
Winke, Paula	1
Yang, Sophie Xin	1
Yao, Lihua	1
More ▼