Publication Date
In 2025 | 0 |
Since 2024 | 0 |
Since 2021 (last 5 years) | 1 |
Since 2016 (last 10 years) | 2 |
Since 2006 (last 20 years) | 18 |
Descriptor
Difficulty Level | 42 |
Item Analysis | 42 |
Test Items | 29 |
Test Construction | 19 |
Item Response Theory | 14 |
Test Reliability | 10 |
Test Validity | 9 |
Multiple Choice Tests | 8 |
Higher Education | 7 |
Test Format | 7 |
Foreign Countries | 6 |
More ▼ |
Source
Author
Publication Type
Reports - Evaluative | 42 |
Journal Articles | 25 |
Speeches/Meeting Papers | 6 |
Numerical/Quantitative Data | 3 |
Information Analyses | 1 |
Reports - Research | 1 |
Education Level
Higher Education | 5 |
Elementary Education | 3 |
Elementary Secondary Education | 2 |
Postsecondary Education | 2 |
Secondary Education | 2 |
Early Childhood Education | 1 |
Grade 1 | 1 |
Grade 5 | 1 |
Grade 6 | 1 |
Preschool Education | 1 |
Audience
Practitioners | 1 |
Researchers | 1 |
Teachers | 1 |
Laws, Policies, & Programs
Assessments and Surveys
SAT (College Admission Test) | 3 |
ACT Assessment | 1 |
Bender Visual Motor Gestalt… | 1 |
Goodenough Harris Drawing Test | 1 |
National Assessment of… | 1 |
Program for International… | 1 |
Test of English as a Foreign… | 1 |
What Works Clearinghouse Rating
Camenares, Devin – International Journal for the Scholarship of Teaching and Learning, 2022
Balancing assessment of learning outcomes with the expectations of students is a perennial challenge in education. Difficult exams, in which many students perform poorly, exacerbate this problem and can inspire a wide variety of interventions, such as a grading curve. However, addressing poor performance can sometimes distort or inflate grades and…
Descriptors: College Students, Student Evaluation, Tests, Test Items
Clauser, Brian E.; Kane, Michael; Clauser, Jerome C. – Journal of Educational Measurement, 2020
An Angoff standard setting study generally yields judgments on a number of items by a number of judges (who may or may not be nested in panels). Variability associated with judges (and possibly panels) contributes error to the resulting cut score. The variability associated with items plays a more complicated role. To the extent that the mean item…
Descriptors: Cutting Scores, Generalization, Decision Making, Standard Setting
Raker, Jeffrey R.; Trate, Jaclyn M.; Holme, Thomas A.; Murphy, Kristen – Journal of Chemical Education, 2013
Experts use their domain expertise and knowledge of examinees' ability levels as they write test items. The expert test writer can then estimate the difficulty of the test items subjectively. However, an objective method for assigning difficulty to a test item would capture the cognitive demands imposed on the examinee as well as be…
Descriptors: Organic Chemistry, Test Items, Item Analysis, Difficulty Level
Cunningham, James W.; Mesmer, Heidi Anne – Elementary School Journal, 2014
Common Core Reading Standard 10 not only prescribes the difficulty of texts students should become able to read, but also the difficulty diet of texts schools should ask their students to read across the school year. The use of quantitative text-assessment tools in the implementation of this standard warrants an examination into the validity of…
Descriptors: Difficulty Level, Academic Standards, State Standards, Statistical Analysis
Hamzah, Mohd Sahandri Gani; Abdullah, Saifuddin Kumar – Online Submission, 2011
The evaluation of learning is a systematic process involving testing, measuring and evaluation. In the testing step, a teacher needs to choose the best instrument that can test the minds of students. Testing will produce scores or marks with many variations either in homogeneous or heterogeneous forms that will be used to categorize the scores…
Descriptors: Test Items, Item Analysis, Difficulty Level, Testing
Malau-Aduli, Bunmi S.; Zimitat, Craig – Assessment & Evaluation in Higher Education, 2012
The aim of this study was to assess the effect of the introduction of peer review processes on the quality of multiple-choice examinations in the first three years of an Australian medical course. The impact of the peer review process and overall quality assurance (QA) processes were evaluated by comparing the examination data generated in earlier…
Descriptors: Foreign Countries, Peer Evaluation, Multiple Choice Tests, Test Construction
Laprise, Shari L. – College Teaching, 2012
Successful exam composition can be a difficult task. Exams should not only assess student comprehension, but be learning tools in and of themselves. In a biotechnology course delivered to nonmajors at a business college, objective multiple-choice test questions often require students to choose the exception or "not true" choice. Anecdotal student…
Descriptors: Feedback (Response), Test Items, Multiple Choice Tests, Biotechnology
Chen, Hanwei; Cui, Zhongmin; Zhu, Rongchun; Gao, Xiaohong – ACT, Inc., 2010
The most critical feature of a common-item nonequivalent groups equating design is that the average score difference between the new and old groups can be accurately decomposed into a group ability difference and a form difficulty difference. Two widely used observed-score linear equating methods, the Tucker and the Levine observed-score methods,…
Descriptors: Equated Scores, Groups, Ability Grouping, Difficulty Level
Adebule, S. O. – Educational Research and Reviews, 2009
This study examined the reliability and difficult indices of Multiple Choice (MC) and True or False (TF) types of objective test items in a Mathematics Achievement Test (MAT). The instruments used were two variants- 50-items Mathematics achievement test based on the multiple choice and true or false test formats. A total of five hundred (500)…
Descriptors: Objective Tests, Mathematics Achievement, Achievement Tests, Test Reliability
Lei, Pui-Wa; Wu, Qiong; DiPerna, James C.; Morgan, Paul L. – Educational and Psychological Measurement, 2009
Currently, few measures are available to monitor young children's progress in acquiring key early academic skills. In response to this need, the authors have begun developing measures (i.e., the Early Arithmetic, Reading and Learning Indicators, or EARLI) of preschoolers' numeracy skills. To accurately and efficiently monitor acquisition of early…
Descriptors: Preschool Children, Measures (Individuals), Numeracy, Emergent Literacy
Withagen, Ans; Vervloed, Mathijs P. J.; Janssen, Neeltje M.; Knoors, Harry; Verhoeven, Ludo – British Journal of Visual Impairment, 2009
The Tactual Profile assesses tactual functioning of children with severe visual impairments between 0 and 16 years of age. The Tactual Profile consists of 430 items, measuring tactile skills required for performing everyday tasks at home and in school. Items are graded according to age level and divided into three domains: tactual sensory, tactual…
Descriptors: Intelligence, Visual Impairments, Verbal Tests, Construct Validity
Costagliola, Gennaro; Fuccella, Vittorio – International Journal of Distance Education Technologies, 2009
To correctly evaluate learners' knowledge, it is important to administer tests composed of good quality question items. By the term "quality" we intend the potential of an item in effectively discriminating between skilled and untrained students and in obtaining tutor's desired difficulty level. This article presents a rule-based e-testing system…
Descriptors: Difficulty Level, Test Items, Computer Assisted Testing, Item Response Theory
Thomas, Conn; Carpenter, Clint – Teacher Education and Practice, 2008
The development of the Texas Assessment of Knowledge and Skills test involves input from educators across the state. The development process attempts to create an assessment that reflects the skills and content understanding of students at the tested grade level. This study attempts to determine other factors that can affect student performance on…
Descriptors: Readability, Science Tests, Item Analysis, Reading Processes
Bowling, Nathan A. – Assessment & Evaluation in Higher Education, 2008
Student ratings of teaching effectiveness are widely used to make judgments of faculty teaching performance. Research, however, has found that such ratings may not be accurate indicators of teaching performance because they are contaminated by course easiness. Using student ratings of 9855 professors employed at 79 different colleges and…
Descriptors: Student Evaluation of Teacher Performance, Correlation, Robustness (Statistics), Item Analysis

Frisbie, David A. – Educational and Psychological Measurement, 1981
The Relative Difficulty Ratio (RDR) was developed as an index of test or item difficulty for use when raw score means or item p-values are not directly comparable because of chance score differences. Computational RDR are described. Applications of the RDR at both the test and item level are illustrated. (Author/BW)
Descriptors: Difficulty Level, Item Analysis, Mathematical Formulas, Test Items