ERIC - Search Results

Publication Date

In 2025	0
Since 2024	0
Since 2021 (last 5 years)	2
Since 2016 (last 10 years)	3
Since 2006 (last 20 years)	6

Descriptor

Difficulty Level	9
Test Format	9
Test Length	9
Test Items	7
Comparative Analysis	5
Item Response Theory	5
Sample Size	5
Equated Scores	3
Test Reliability	3
Computer Assisted Testing	2
Correlation	2
Guidelines	2
Sampling	2
Scores	2
Simulation	2
Statistical Analysis	2
Accuracy	1
Adaptive Testing	1
Computation	1
Constructed Response	1
Educational Testing	1
English (Second Language)	1
Error Patterns	1
Error of Measurement	1
Estimation (Mathematics)	1
More ▼

Source

ProQuest LLC	2
Applied Measurement in…	1
Educational and Psychological…	1
Quality Assurance in…	1
Research Matters	1

Author

Benton, Tom	1
Coats, Pamela K.	1
DeMars, Christine E.	1
Henning, Grant	1
Kárász, Judit T.	1
Lee, Won-Chan	1
Lim, Euijin	1
Oosterhof, Albert C.	1
Socha, Alan	1
Sunnassee, Devdass	1
Széll, Krisztián	1
Takács, Szabolcs	1
Thissen, David	1
Wainer, Howard	1
Wu, Yi-Fang	1
More ▼

Publication Type

Reports - Research	6
Journal Articles	4
Dissertations/Theses -…	2
Information Analyses	1
Reports - Evaluative	1
Speeches/Meeting Papers	1

Education Level

Audience

Location

United Kingdom

Laws, Policies, & Programs

Assessments and Surveys

Test of English as a Foreign…

What Works Clearinghouse Rating

Showing all 9 results Save | Export

Closed Formula of Test Length Required for Adaptive Testing with Medium Probability of Solution

Peer reviewed

Direct link

Kárász, Judit T.; Széll, Krisztián; Takács, Szabolcs – Quality Assurance in Education: An International Perspective, 2023

Purpose: Based on the general formula, which depends on the length and difficulty of the test, the number of respondents and the number of ability levels, this study aims to provide a closed formula for the adaptive tests with medium difficulty (probability of solution is p = 1/2) to determine the accuracy of the parameters for each item and in…

Descriptors: Test Length, Probability, Comparative Analysis, Difficulty Level

Subscore Equating and Profile Reporting

Peer reviewed

Direct link

Lim, Euijin; Lee, Won-Chan – Applied Measurement in Education, 2020

The purpose of this study is to address the necessity of subscore equating and to evaluate the performance of various equating methods for subtests. Assuming the random groups design and number-correct scoring, this paper analyzed real data and simulated data with four study factors including test dimensionality, subtest length, form difference in…

Descriptors: Equated Scores, Test Length, Test Format, Difficulty Level

Item Response Theory, Computer Adaptive Testing and the Risk of Self-Deception

Download full text

Benton, Tom – Research Matters, 2021

Computer adaptive testing is intended to make assessment more reliable by tailoring the difficulty of the questions a student has to answer to their level of ability. Most commonly, this benefit is used to justify the length of tests being shortened whilst retaining the reliability of a longer, non-adaptive test. Improvements due to adaptive…

Descriptors: Risk, Item Response Theory, Computer Assisted Testing, Difficulty Level

Accuracy and Variability of Item Parameter Estimates from Marginal Maximum a Posteriori Estimation and Bayesian Inference via Gibbs Samplers

Direct link

Wu, Yi-Fang – ProQuest LLC, 2015

Item response theory (IRT) uses a family of statistical models for estimating stable characteristics of items and examinees and defining how these characteristics interact in describing item and test performance. With a focus on the three-parameter logistic IRT (Birnbaum, 1968; Lord, 1980) model, the current study examines the accuracy and…

Descriptors: Item Response Theory, Test Items, Accuracy, Computation

An Investigation of Sample Size Splitting on ATFIND and DIMTEST

Peer reviewed

Direct link

Socha, Alan; DeMars, Christine E. – Educational and Psychological Measurement, 2013

Modeling multidimensional test data with a unidimensional model can result in serious statistical errors, such as bias in item parameter estimates. Many methods exist for assessing the dimensionality of a test. The current study focused on DIMTEST. Using simulated data, the effects of sample size splitting for use with the ATFIND procedure for…

Descriptors: Sample Size, Test Length, Correlation, Test Format

Conditions Affecting the Accuracy of Classical Equating Methods for Small Samples under the NEAT Design: A Simulation Study

Direct link

Sunnassee, Devdass – ProQuest LLC, 2011

Small sample equating remains a largely unexplored area of research. This study attempts to fill in some of the research gaps via a large-scale, IRT-based simulation study that evaluates the performance of seven small-sample equating methods under various test characteristic and sampling conditions. The equating methods considered are typically…

Descriptors: Test Length, Test Format, Sample Size, Simulation

On Examinee Choice in Educational Testing. GRE Board Professional Report No. 91-17P.

Download full text

Wainer, Howard; Thissen, David – 1994

When an examination consists in whole or part of constructed response test items, it is common practice to allow the examinee to choose a subset of the constructed response questions from a larger pool. It is sometimes argued that, if choice were not allowed, the limitations on domain coverage forced by the small number of items might unfairly…

Descriptors: Constructed Response, Difficulty Level, Educational Testing, Equated Scores

Comparison of Difficulties and Reliabilities of Math-Completion and Multiple-Choice Item Formats.

Download full text

Oosterhof, Albert C.; Coats, Pamela K. – 1981

Instructors who develop classroom examinations that require students to provide a numerical response to a mathematical problem are often very concerned about the appropriateness of the multiple-choice format. The present study augments previous research relevant to this concern by comparing the difficulty and reliability of multiple-choice and…

Descriptors: Comparative Analysis, Difficulty Level, Grading, Higher Education

Test-Retest Analyses of the Test of English as a Foreign Language. TOEFL Research Reports Report 45.

Download full text

Henning, Grant – 1993

This study provides information about the total and component scores of the Test of English as a Foreign Language (TOEFL). First, the study provides comparative global and component estimates of test-retest, alternate-form, and internal-consistency reliability, controlling for sources of measurement error inherent in the examinees and the testing…

Descriptors: Difficulty Level, English (Second Language), Error of Measurement, Estimation (Mathematics)