ERIC - Search Results

Publication Date

In 2025	0
Since 2024	1
Since 2021 (last 5 years)	2
Since 2016 (last 10 years)	3
Since 2006 (last 20 years)	4

Descriptor

Difficulty Level	14
Test Length	14
Test Reliability	14
Test Items	10
Foreign Countries	4
Test Construction	4
Test Validity	4
Item Analysis	3
Test Format	3
Comparative Analysis	2
Computer Assisted Testing	2
Cutting Scores	2
Equated Scores	2
Error of Measurement	2
Higher Education	2
Item Banks	2
Item Response Theory	2
Language Proficiency	2
Language Tests	2
Mastery Tests	2
Multiple Choice Tests	2
Scores	2
Simulation	2
Statistical Analysis	2
Test Bias	2
More ▼

Source

Applied Psychological…	1
Assessment & Evaluation in…	1
Journal of Deaf Studies and…	1
Journal of Psychoeducational…	1
Research Matters	1

Publication Type

Reports - Research	11
Journal Articles	5
Speeches/Meeting Papers	3
Guides - Non-Classroom	2
Reports - Evaluative	2

Education Level

Higher Education	1
Postsecondary Education	1

Audience

Location

Australia	1
China	1
Netherlands	1
United Kingdom	1

Laws, Policies, & Programs

Assessments and Surveys

Comprehensive Tests of Basic…	1
Stanford Binet Intelligence…	1
Test of English as a Foreign…	1

What Works Clearinghouse Rating

Showing all 14 results Save | Export

Effect of Sample Length on MLU in Mandarin-Speaking Hard-of-Hearing Children

Peer reviewed

Direct link

Chia-Ying Chu; Pei-Hua Chen; Yi-Shin Tsai; Chieh-An Chen; Yi-Chih Chan; Yan-Jhe Ciou – Journal of Deaf Studies and Deaf Education, 2024

This study investigated the impact of language sample length on mean length of utterance (MLU) and aimed to determine the minimum number of utterances required for a reliable MLU. Conversations were collected from Mandarin-speaking, hard-of-hearing and typical-hearing children aged 16-81 months. The MLUs were calculated using sample sizes ranging…

Descriptors: Foreign Countries, Mandarin Chinese, Young Children, Language Acquisition

Item Response Theory, Computer Adaptive Testing and the Risk of Self-Deception

Download full text

Benton, Tom – Research Matters, 2021

Computer adaptive testing is intended to make assessment more reliable by tailoring the difficulty of the questions a student has to answer to their level of ability. Most commonly, this benefit is used to justify the length of tests being shortened whilst retaining the reliability of a longer, non-adaptive test. Improvements due to adaptive…

Descriptors: Risk, Item Response Theory, Computer Assisted Testing, Difficulty Level

Difference in Generality between Item Pairs Mediates Effect of Difference in Items' Lengths on Inter-Item Correlation Size

Peer reviewed

Direct link

Hamby, Tyler – Journal of Psychoeducational Assessment, 2018

In this study, the author examined potential mediators of the negative relationship between the absolute difference in items' lengths and their inter-item correlation size. Fifty-two randomly ordered items from five personality scales were administered to 622 university students, and 46 respondents from a survey website rated the items'…

Descriptors: Correlation, Personality Traits, Undergraduate Students, Difficulty Level

Sampling Knowledge and Understanding: How Long Should a Test Be?

Peer reviewed

Direct link

Burton, Richard F. – Assessment & Evaluation in Higher Education, 2006

Many academic tests (e.g. short-answer and multiple-choice) sample required knowledge with questions scoring 0 or 1 (dichotomous scoring). Few textbooks give useful guidance on the length of test needed to do this reliably. Posey's binomial error model of 1932 provides the best starting point, but allows neither for heterogeneity of question…

Descriptors: Item Sampling, Tests, Test Length, Test Reliability

Influence of Test and Person Characteristics on Nonparametric Appropriateness Measurement.

Peer reviewed

Meijer, Rob R.; And Others – Applied Psychological Measurement, 1994

The power of the nonparametric person-fit statistic, U3, is investigated through simulations as a function of item characteristics, test characteristics, person characteristics, and the group to which examinees belong. Results suggest conditions under which relatively short tests can be used for person-fit analysis. (SLD)

Descriptors: Difficulty Level, Group Membership, Item Response Theory, Nonparametric Statistics

Consideration for Sample Size in Reliability Studies for Mastery Tests. Publication Series in Mastery Testing.

Download full text

Saunders, Joseph C.; Huynh, Huynh – 1980

In most reliability studies, the precision of a reliability estimate varies inversely with the number of examinees (sample size). Thus, to achieve a given level of accuracy, some minimum sample size is required. An approximation for this minimum size may be made if some reasonable assumptions regarding the mean and standard deviation of the test…

Descriptors: Cutting Scores, Difficulty Level, Error of Measurement, Mastery Tests

Practical Procedures for Constructing Mastery Tests to Minimize Errors of Classification and to Maximize or Optimize Decision Reliability.

Byars, Alvin Gregg – 1980

The objectives of this investigation are to develop, describe, assess, and demonstrate procedures for constructing mastery tests to minimize errors of classification and to maximize decision reliability. The guidelines are based on conditions where item exchangeability is a reasonable assumption and the test constructor can control the number of…

Descriptors: Cutting Scores, Difficulty Level, Grade 4, Intermediate Grades

Q. How Many Options Should a Multiple-Choice Question Have? (a) 2. (b) 3. (c) 4. At-a-glance Research Report.

Catts, Ralph – 1978

The reliability of multiple choice tests--containing different numbers of response options--was investigated for 260 students enrolled in technical college economics courses. Four test forms, constructed from previously used four-option items, were administered, consisting of (1) 60 two-option items--two distractors randomly discarded; (2) 40…

Descriptors: Answer Sheets, Difficulty Level, Foreign Countries, Higher Education

Tailoring Tests to Educational Levels.

Download full text

de Jong, John H. A. L. – 1984

The Netherlands' secondary education system is highly differentiated, with four different school types for four scholastic ability levels. Final examinations must accommodate these four levels, and require a test-independent definition of the intended final ability levels as well as a sample-free evaluation of the range of ability levels at which…

Descriptors: Difficulty Level, Efficiency, Equated Scores, Foreign Countries

Comparison of Difficulties and Reliabilities of Math-Completion and Multiple-Choice Item Formats.

Download full text

Oosterhof, Albert C.; Coats, Pamela K. – 1981

Instructors who develop classroom examinations that require students to provide a numerical response to a mathematical problem are often very concerned about the appropriateness of the multiple-choice format. The present study augments previous research relevant to this concern by comparing the difficulty and reliability of multiple-choice and…

Descriptors: Comparative Analysis, Difficulty Level, Grading, Higher Education

Comparative Racial Analysis of Enlisted Advancement Exams: Item Differentiation. Final Report.

Download full text

Robertson, David W.; And Others – 1977

A comparative study of item analysis was conducted on the basis of race to determine whether alternative test construction or processing might increase the proportion of black enlisted personnel among those passing various military technical knowledge examinations. The study used data from six specialists at four grade levels and investigated item…

Descriptors: Difficulty Level, Enlisted Personnel, Item Analysis, Occupational Tests

Test-Retest Analyses of the Test of English as a Foreign Language. TOEFL Research Reports Report 45.

Download full text

Henning, Grant – 1993

This study provides information about the total and component scores of the Test of English as a Foreign Language (TOEFL). First, the study provides comparative global and component estimates of test-retest, alternate-form, and internal-consistency reliability, controlling for sources of measurement error inherent in the examinees and the testing…

Descriptors: Difficulty Level, English (Second Language), Error of Measurement, Estimation (Mathematics)

Manual for the USES Basic Occupational Literacy Test. Section 2: Development.

PDF pending restoration

Manpower Administration (DOL), Washington, DC. – 1972

The Basic Occupational Literacy Test (BOLT) was developed as an achievement test of basic skills in reading and arithmetic, for educationally disadvantaged adults. The objective was to develop a test appropriate for this population with regard to content, format, instructions, timing, norms, and difficulty level. A major issue, the use of grade…

Descriptors: Achievement Tests, Adult Basic Education, Adults, Basic Skills

Evaluations of Implied Orders as a Basis for Tailored Testing Using Simulations. Technical Report No. 4.

Cliff, Norman; And Others – 1977

TAILOR is a computer program that uses the implied orders concept as the basis for computerized adaptive testing. The basic characteristics of TAILOR, which does not involve pretesting, are reviewed here and two studies of it are reported. One is a Monte Carlo simulation based on the four-parameter Birnbaum model and the other uses a matrix of…

Descriptors: Adaptive Testing, Computer Assisted Testing, Computer Programs, Difficulty Level

Benton, Tom	1
Burton, Richard F.	1
Byars, Alvin Gregg	1
Catts, Ralph	1
Chia-Ying Chu	1
Chieh-An Chen	1
Cliff, Norman	1
Coats, Pamela K.	1
Hamby, Tyler	1
Henning, Grant	1
Huynh, Huynh	1
Meijer, Rob R.	1
Oosterhof, Albert C.	1
Pei-Hua Chen	1
Robertson, David W.	1
Saunders, Joseph C.	1
Yan-Jhe Ciou	1
Yi-Chih Chan	1
Yi-Shin Tsai	1
de Jong, John H. A. L.	1
More ▼