NotesFAQContact Us
Collection
Advanced
Search Tips
Showing all 14 results Save | Export
Peer reviewed Peer reviewed
Direct linkDirect link
Chia-Ying Chu; Pei-Hua Chen; Yi-Shin Tsai; Chieh-An Chen; Yi-Chih Chan; Yan-Jhe Ciou – Journal of Deaf Studies and Deaf Education, 2024
This study investigated the impact of language sample length on mean length of utterance (MLU) and aimed to determine the minimum number of utterances required for a reliable MLU. Conversations were collected from Mandarin-speaking, hard-of-hearing and typical-hearing children aged 16-81 months. The MLUs were calculated using sample sizes ranging…
Descriptors: Foreign Countries, Mandarin Chinese, Young Children, Language Acquisition
Benton, Tom – Research Matters, 2021
Computer adaptive testing is intended to make assessment more reliable by tailoring the difficulty of the questions a student has to answer to their level of ability. Most commonly, this benefit is used to justify the length of tests being shortened whilst retaining the reliability of a longer, non-adaptive test. Improvements due to adaptive…
Descriptors: Risk, Item Response Theory, Computer Assisted Testing, Difficulty Level
Peer reviewed Peer reviewed
Direct linkDirect link
Hamby, Tyler – Journal of Psychoeducational Assessment, 2018
In this study, the author examined potential mediators of the negative relationship between the absolute difference in items' lengths and their inter-item correlation size. Fifty-two randomly ordered items from five personality scales were administered to 622 university students, and 46 respondents from a survey website rated the items'…
Descriptors: Correlation, Personality Traits, Undergraduate Students, Difficulty Level
Peer reviewed Peer reviewed
Direct linkDirect link
Burton, Richard F. – Assessment & Evaluation in Higher Education, 2006
Many academic tests (e.g. short-answer and multiple-choice) sample required knowledge with questions scoring 0 or 1 (dichotomous scoring). Few textbooks give useful guidance on the length of test needed to do this reliably. Posey's binomial error model of 1932 provides the best starting point, but allows neither for heterogeneity of question…
Descriptors: Item Sampling, Tests, Test Length, Test Reliability
Peer reviewed Peer reviewed
Meijer, Rob R.; And Others – Applied Psychological Measurement, 1994
The power of the nonparametric person-fit statistic, U3, is investigated through simulations as a function of item characteristics, test characteristics, person characteristics, and the group to which examinees belong. Results suggest conditions under which relatively short tests can be used for person-fit analysis. (SLD)
Descriptors: Difficulty Level, Group Membership, Item Response Theory, Nonparametric Statistics
Saunders, Joseph C.; Huynh, Huynh – 1980
In most reliability studies, the precision of a reliability estimate varies inversely with the number of examinees (sample size). Thus, to achieve a given level of accuracy, some minimum sample size is required. An approximation for this minimum size may be made if some reasonable assumptions regarding the mean and standard deviation of the test…
Descriptors: Cutting Scores, Difficulty Level, Error of Measurement, Mastery Tests
Byars, Alvin Gregg – 1980
The objectives of this investigation are to develop, describe, assess, and demonstrate procedures for constructing mastery tests to minimize errors of classification and to maximize decision reliability. The guidelines are based on conditions where item exchangeability is a reasonable assumption and the test constructor can control the number of…
Descriptors: Cutting Scores, Difficulty Level, Grade 4, Intermediate Grades
Catts, Ralph – 1978
The reliability of multiple choice tests--containing different numbers of response options--was investigated for 260 students enrolled in technical college economics courses. Four test forms, constructed from previously used four-option items, were administered, consisting of (1) 60 two-option items--two distractors randomly discarded; (2) 40…
Descriptors: Answer Sheets, Difficulty Level, Foreign Countries, Higher Education
de Jong, John H. A. L. – 1984
The Netherlands' secondary education system is highly differentiated, with four different school types for four scholastic ability levels. Final examinations must accommodate these four levels, and require a test-independent definition of the intended final ability levels as well as a sample-free evaluation of the range of ability levels at which…
Descriptors: Difficulty Level, Efficiency, Equated Scores, Foreign Countries
Oosterhof, Albert C.; Coats, Pamela K. – 1981
Instructors who develop classroom examinations that require students to provide a numerical response to a mathematical problem are often very concerned about the appropriateness of the multiple-choice format. The present study augments previous research relevant to this concern by comparing the difficulty and reliability of multiple-choice and…
Descriptors: Comparative Analysis, Difficulty Level, Grading, Higher Education
Robertson, David W.; And Others – 1977
A comparative study of item analysis was conducted on the basis of race to determine whether alternative test construction or processing might increase the proportion of black enlisted personnel among those passing various military technical knowledge examinations. The study used data from six specialists at four grade levels and investigated item…
Descriptors: Difficulty Level, Enlisted Personnel, Item Analysis, Occupational Tests
Henning, Grant – 1993
This study provides information about the total and component scores of the Test of English as a Foreign Language (TOEFL). First, the study provides comparative global and component estimates of test-retest, alternate-form, and internal-consistency reliability, controlling for sources of measurement error inherent in the examinees and the testing…
Descriptors: Difficulty Level, English (Second Language), Error of Measurement, Estimation (Mathematics)
PDF pending restoration PDF pending restoration
Manpower Administration (DOL), Washington, DC. – 1972
The Basic Occupational Literacy Test (BOLT) was developed as an achievement test of basic skills in reading and arithmetic, for educationally disadvantaged adults. The objective was to develop a test appropriate for this population with regard to content, format, instructions, timing, norms, and difficulty level. A major issue, the use of grade…
Descriptors: Achievement Tests, Adult Basic Education, Adults, Basic Skills
Cliff, Norman; And Others – 1977
TAILOR is a computer program that uses the implied orders concept as the basis for computerized adaptive testing. The basic characteristics of TAILOR, which does not involve pretesting, are reviewed here and two studies of it are reported. One is a Monte Carlo simulation based on the four-parameter Birnbaum model and the other uses a matrix of…
Descriptors: Adaptive Testing, Computer Assisted Testing, Computer Programs, Difficulty Level