ERIC - Search Results

Publication Date

In 2025	0
Since 2024	0
Since 2021 (last 5 years)	3
Since 2016 (last 10 years)	4
Since 2006 (last 20 years)	8

Descriptor

Test Items	29
Test Length	29
Test Validity	29
Test Reliability	16
Test Construction	14
Test Format	9
Adaptive Testing	8
Computer Assisted Testing	8
Testing Problems	7
Difficulty Level	5
Item Analysis	5
Item Banks	5
Item Response Theory	5
Comparative Analysis	4
Higher Education	4
Mastery Tests	4
Multiple Choice Tests	4
Psychometrics	4
Test Bias	4
Achievement Tests	3
Comparative Testing	3
Computer Simulation	3
Correlation	3
Foreign Countries	3
Models	3
More ▼

Source

International Journal of…	2
Journal of Educational…	2
Educational Research and…	1
Educational and Psychological…	1
Eurasian Journal of…	1
Grantee Submission	1
Journal of Experimental…	1
Measurement:…	1
Physical Review Physics…	1
ProQuest LLC	1

Publication Type

Reports - Research	17
Journal Articles	10
Speeches/Meeting Papers	7
Reports - Evaluative	6
Guides - Non-Classroom	2
Reports - Descriptive	2
Dissertations/Theses -…	1
Information Analyses	1
Opinion Papers	1
Reference Materials -…	1

Education Level

Higher Education	2
Postsecondary Education	2
Elementary Education	1
Elementary Secondary Education	1
Grade 6	1
Intermediate Grades	1
Middle Schools	1

Audience

Researchers	2
Community	1
Practitioners	1

Location

Turkey	2
Japan	1
New Jersey	1

Laws, Policies, & Programs

Job Training Partnership Act…

Assessments and Surveys

Force Concept Inventory	1
Stanford Binet Intelligence…	1
Test of English as a Foreign…	1
Wechsler Intelligence Scale…	1
Wechsler Intelligence Scales…	1

What Works Clearinghouse Rating

Showing 1 to 15 of 29 results Save | Export

A Comparison of the Efficacies of Differential Item Functioning Detection Methods

Peer reviewed
PDF on ERIC

Download full text

Basman, Munevver – International Journal of Assessment Tools in Education, 2023

To ensure the validity of the tests is to check that all items have similar results across different groups of individuals. However, differential item functioning (DIF) occurs when the results of individuals with equal ability levels from different groups differ from each other on the same test item. Based on Item Response Theory and Classic Test…

Descriptors: Test Bias, Test Items, Test Validity, Item Response Theory

Improving Test Security and Efficiency of Computerized Adaptive Testing for the Force Concept Inventory

Peer reviewed

Direct link

Yasuda, Jun-ichiro; Hull, Michael M.; Mae, Naohiro – Physical Review Physics Education Research, 2022

This paper presents improvements made to a computerized adaptive testing (CAT)-based version of the FCI (FCI-CAT) in regards to test security and test efficiency. First, we will discuss measures to enhance test security by controlling for item overexposure, decreasing the risk that respondents may (i) memorize the content of a pretest for use on…

Descriptors: Adaptive Testing, Computer Assisted Testing, Test Items, Risk Management

Comparison of Confirmatory Factor Analysis Estimation Methods on Mixed-Format Data

Peer reviewed
PDF on ERIC

Download full text

Kilic, Abdullah Faruk; Dogan, Nuri – International Journal of Assessment Tools in Education, 2021

Weighted least squares (WLS), weighted least squares mean-and-variance-adjusted (WLSMV), unweighted least squares mean-and-variance-adjusted (ULSMV), maximum likelihood (ML), robust maximum likelihood (MLR) and Bayesian estimation methods were compared in mixed item response type data via Monte Carlo simulation. The percentage of polytomous items,…

Descriptors: Factor Analysis, Computation, Least Squares Statistics, Maximum Likelihood Statistics

Identifying Aberrant Responding: Use of Multiple Measures

Direct link

Steinkamp, Susan Christa – ProQuest LLC, 2017

For test scores that rely on the accurate estimation of ability via an IRT model, their use and interpretation is dependent upon the assumption that the IRT model fits the data. Examinees who do not put forth full effort in answering test questions, have prior knowledge of test content, or do not approach a test with the intent of answering…

Descriptors: Test Items, Item Response Theory, Scores, Test Wiseness

Broadening the Scope of Reading Comprehension Using Scenario-Based Assessments: Preliminary Findings and Challenges

Peer reviewed
PDF on ERIC

Download full text

Sabatini, J.; O'Reilly, T.; Halderman, L.; Bruce, K. – Grantee Submission, 2014

Existing reading comprehension assessments have been criticized by researchers, educators, and policy makers, especially regarding their coverage, utility, and authenticity. The purpose of the current study was to evaluate a new assessment of reading comprehension that was designed to broaden the construct of reading. In light of these issues, we…

Descriptors: Reading Comprehension, Vignettes, Reading Tests, Elementary School Students

Relating Unidimensional IRT Parameters to a Multidimensional Response Space: A Review of Two Alternative Projection IRT Models for Scoring Subscales

Peer reviewed

Direct link

Kahraman, Nilufer; Thompson, Tony – Journal of Educational Measurement, 2011

A practical concern for many existing tests is that subscore test lengths are too short to provide reliable and meaningful measurement. A possible method of improving the subscale reliability and validity would be to make use of collateral information provided by items from other subscales of the same test. To this end, the purpose of this article…

Descriptors: Test Length, Test Items, Alignment (Education), Models

Ongoing Issues in Test Fairness

Peer reviewed

Direct link

Camilli, Gregory – Educational Research and Evaluation, 2013

In the attempt to identify or prevent unfair tests, both quantitative analyses and logical evaluation are often used. For the most part, fairness evaluation is a pragmatic attempt at determining whether procedural or substantive due process has been accorded to either a group of test takers or an individual. In both the individual and comparative…

Descriptors: Alternative Assessment, Test Bias, Test Content, Test Format

Application of Computerized Adaptive Testing to Entrance Examination for Graduate Studies in Turkey

Peer reviewed
PDF on ERIC

Download full text

Bulut, Okan; Kan, Adnan – Eurasian Journal of Educational Research, 2012

Problem Statement: Computerized adaptive testing (CAT) is a sophisticated and efficient way of delivering examinations. In CAT, items for each examinee are selected from an item bank based on the examinee's responses to the items. In this way, the difficulty level of the test is adjusted based on the examinee's ability level. Instead of…

Descriptors: Adaptive Testing, Computer Assisted Testing, College Entrance Examinations, Graduate Students

What's Wrong with Three-Option Multiple Choice Items?

Peer reviewed

Owen, Steven V.; Froman, Robin D. – Educational and Psychological Measurement, 1987

To test further for efficacy of three-option achievement items, parallel three- and five-option item tests were distributed randomly to college students. Results showed no differences in mean item difficulty, mean discrimination or total test score, but a substantial reduction in time spent on three-option items. (Author/BS)

Descriptors: Achievement Tests, Higher Education, Multiple Choice Tests, Test Format

Multiple Choice and True-False: Reliability and Validity Compared.

Peer reviewed

Green, Kathy – Journal of Experimental Education, 1979

Reliabilities and concurrent validities of teacher-made multiple-choice and true-false tests were compared. No significant differences were found even when multiple-choice reliability was adjusted to equate testing time. (Author/MH)

Descriptors: Comparative Testing, Higher Education, Multiple Choice Tests, Test Format

Some Empirical Guidelines for Building Testlets. Program Statistics Research Technical Report No. 91-14.

Download full text

Wainer, Howard; And Others – 1991

A series of computer simulations was run to measure the relationship between testlet validity and the factors of item pool size and testlet length for both adaptive and linearly constructed testlets. Results confirmed the generality of earlier empirical findings of H. Wainer and others (1991) that making a testlet adaptive yields only marginal…

Descriptors: Adaptive Testing, Computer Assisted Testing, Computer Simulation, Item Banks

An Examination of the Feasibility of Using Criterion-Referenced Measurement in Large-Scale, Survey Testing Situations.

Download full text

Graham, Darol L. – 1974

The adequacy of a test developed for statewide assessment of basic mathematics skills was investigated. The test, comprised of multiple-choice items reflecting a series of behavioral objectives, was compared with a more extensive criterion measure generated from the same objectives by the application of a strict item sampling model. In many…

Descriptors: Comparative Testing, Criterion Referenced Tests, Educational Assessment, Item Sampling

Test Length and Validity: An Application of Test Theory to a Finite World.

Myers, Charles T. – 1978

The viewpoint is expressed that adding to test reliability by either selecting a more homogeneous set of items, restricting the range of item difficulty as closely as possible to the most efficient level, or increasing the number of items will not add to test validity and that there is considerable danger that efforts to increase reliability may…

Descriptors: Achievement Tests, Item Analysis, Multiple Choice Tests, Test Construction

A Comparison of the Performance of Simulated Hierarchical and Linear Testlets.

Peer reviewed

Wainer, Howard; And Others – Journal of Educational Measurement, 1992

Computer simulations were run to measure the relationship between testlet validity and factors of item pool size and testlet length for both adaptive and linearly constructed testlets. Making a testlet adaptive yields only modest increases in aggregate validity because of the peakedness of the typical proficiency distribution. (Author/SLD)

Descriptors: Adaptive Testing, Comparative Testing, Computer Assisted Testing, Computer Simulation

Testing the Robustness of DIMTEST on Nonnormal Ability Distributions.

Download full text

Nandakumar, Ratna; Yu, Feng – 1994

DIMTEST is a statistical test procedure for assessing essential unidimensionality of binary test item responses. The test statistic T used for testing the null hypothesis of essential unidimensionality is a nonparametric statistic. That is, there is no particular parametric distribution assumed for the underlying ability distribution or for the…

Descriptors: Ability, Content Validity, Correlation, Nonparametric Statistics

Previous Page | Next Page »

Pages: 1 | 2

Wainer, Howard	3
Basman, Munevver	1
Boyd, Thomas A.	1
Bruce, K.	1
Bulut, Okan	1
Byars, Alvin Gregg	1
Camilli, Gregory	1
Cliff, Norman	1
Coats, Pamela K.	1
Dogan, Nuri	1
Embretson, Susan E.	1
Frick, Theodore W.	1
Froman, Robin D.	1
Graham, Darol L.	1
Green, Kathy	1
Halderman, L.	1
Harnisch, Delwyn L.	1
Hisama, Kay K.	1
Hull, Michael M.	1
Kahraman, Nilufer	1
Kan, Adnan	1
Kilic, Abdullah Faruk	1
Larson, Gordon A.	1
Mae, Naohiro	1
More ▼