Publication Date
| In 2026 | 0 |
| Since 2025 | 1 |
| Since 2022 (last 5 years) | 4 |
| Since 2017 (last 10 years) | 9 |
| Since 2007 (last 20 years) | 16 |
Descriptor
| Difficulty Level | 87 |
| Testing Problems | 87 |
| Test Items | 53 |
| Test Construction | 23 |
| Item Analysis | 20 |
| Higher Education | 19 |
| Scores | 17 |
| Latent Trait Theory | 15 |
| Test Format | 15 |
| Test Validity | 15 |
| Foreign Countries | 14 |
| More ▼ | |
Source
Author
| Wainer, Howard | 3 |
| Jaeger, Richard M. | 2 |
| Legg, Sue M. | 2 |
| Reckase, Mark D. | 2 |
| Abdul-Kareem, Muneera M. | 1 |
| Al-Nouh, Nowreyah A. | 1 |
| Algina, James | 1 |
| Andrés Christiansen | 1 |
| Arefsadr, Sajjad | 1 |
| Babaii, Esmat | 1 |
| Babcock, Ben | 1 |
| More ▼ | |
Publication Type
Education Level
| Higher Education | 4 |
| Postsecondary Education | 3 |
| Secondary Education | 3 |
| Elementary Education | 2 |
| Elementary Secondary Education | 1 |
| Grade 4 | 1 |
| High Schools | 1 |
| Intermediate Grades | 1 |
Audience
| Researchers | 11 |
| Practitioners | 4 |
| Teachers | 3 |
| Parents | 1 |
Location
| Netherlands | 2 |
| United Kingdom (England) | 2 |
| California | 1 |
| China | 1 |
| Germany | 1 |
| Illinois | 1 |
| Iran | 1 |
| Kentucky | 1 |
| Kuwait | 1 |
| New Zealand | 1 |
| Sweden | 1 |
| More ▼ | |
Laws, Policies, & Programs
| No Child Left Behind Act 2001 | 1 |
Assessments and Surveys
What Works Clearinghouse Rating
Kim, Sooyeon; Walker, Michael – ETS Research Report Series, 2021
In this investigation, we used real data to assess potential differential effects associated with taking a test in a test center (TC) versus testing at home using remote proctoring (RP). We used a pseudo-equivalent groups (PEG) approach to examine group equivalence at the item level and the total score level. If our assumption holds that the PEG…
Descriptors: Testing, Distance Education, Comparative Analysis, Test Items
Bramley, Tom; Crisp, Victoria – Assessment in Education: Principles, Policy & Practice, 2019
For many years, question choice has been used in some UK public examinations, with students free to choose which questions they answer from a selection (within certain parameters). There has been little published research on choice of exam questions in recent years in the UK. In this article we distinguish different scenarios in which choice…
Descriptors: Test Items, Test Construction, Difficulty Level, Foreign Countries
Andrés Christiansen; Rianne Janssen – Educational Assessment, Evaluation and Accountability, 2024
In international large-scale assessments, students may not be compelled to answer every test item: a student can decide to skip a seemingly difficult item or may drop out before the end of the test is reached. The way these missing responses are treated will affect the estimation of the item difficulty and student ability, and ultimately affect…
Descriptors: Test Items, Item Response Theory, Grade 4, International Assessment
Camenares, Devin – International Journal for the Scholarship of Teaching and Learning, 2022
Balancing assessment of learning outcomes with the expectations of students is a perennial challenge in education. Difficult exams, in which many students perform poorly, exacerbate this problem and can inspire a wide variety of interventions, such as a grading curve. However, addressing poor performance can sometimes distort or inflate grades and…
Descriptors: College Students, Student Evaluation, Tests, Test Items
Arefsadr, Sajjad; Babaii, Esmat – TESL-EJ, 2023
According to the IELTS official website, IELTS candidates usually score lower in the IELTS Writing test than in the other language skills. This is disappointing for the many IELTS candidates who fail to get the overall band score they need. Surprisingly enough, few studies have addressed this issue. The present study, then, is aimed at shedding…
Descriptors: Second Language Learning, Language Tests, English (Second Language), Foreign Countries
Wyse, Adam E.; Babcock, Ben – Educational Measurement: Issues and Practice, 2020
A common belief is that the Bookmark method is a cognitively simpler standard-setting method than the modified Angoff method. However, a limited amount of research has investigated panelist's ability to perform well the Bookmark method, and whether some of the challenges panelists face with the Angoff method may also be present in the Bookmark…
Descriptors: Standard Setting (Scoring), Evaluation Methods, Testing Problems, Test Items
Peiyu Wang; Liying Cheng – Critical Inquiry in Language Studies, 2025
This study employed a multi-methods design to investigate the impact of preparation on Chinese test-takers' perceptions of the integrated TOEFL iBT speaking and writing design. Combining results from over 1700 surveys and 10 interviews, it was found that these Chinese test-takers, who are the most vulnerable group in the multimillion testing…
Descriptors: Foreign Countries, Second Language Learning, English (Second Language), Language Tests
FIPC Linking across Multidimensional Test Forms: Effects of Confounding Difficulty within Dimensions
Kim, Sohee; Cole, Ki Lynn; Mwavita, Mwarumba – International Journal of Testing, 2018
This study investigated the effects of linking potentially multidimensional test forms using the fixed item parameter calibration. Forms had equal or unequal total test difficulty with and without confounding difficulty. The mean square errors and bias of estimated item and ability parameters were compared across the various confounding tests. The…
Descriptors: Test Items, Item Response Theory, Test Format, Difficulty Level
Roelle, Julian; Roelle, Detlev; Berthold, Kirsten – Journal of Experimental Education, 2019
Providing test questions after an initial study phase is a common instructional technique. In theory, questions that require higher-level (deep) processing should be more beneficial than those that require lower-level (shallow) processing. However, empirical evidence on the matter is inconsistent. To shed light on two potential reasons for these…
Descriptors: Testing Problems, Test Items, Cognitive Processes, Problem Based Learning
Guo, Hongwen; Rios, Joseph A.; Haberman, Shelby; Liu, Ou Lydia; Wang, Jing; Paek, Insu – Applied Measurement in Education, 2016
Unmotivated test takers using rapid guessing in item responses can affect validity studies and teacher and institution performance evaluation negatively, making it critical to identify these test takers. The authors propose a new nonparametric method for finding response-time thresholds for flagging item responses that result from rapid-guessing…
Descriptors: Guessing (Tests), Reaction Time, Nonparametric Statistics, Models
Henning, Grant – English Teaching Forum, 2012
To some extent, good testing procedure, like good language use, can be achieved through avoidance of errors. Almost any language-instruction program requires the preparation and administration of tests, and it is only to the extent that certain common testing mistakes have been avoided that such tests can be said to be worthwhile selection,…
Descriptors: Testing, English (Second Language), Testing Problems, Student Evaluation
Knell, Janie L.; Wilhoite, Andrea P.; Fugate, Joshua Z.; González-Espada, Wilson J. – Electronic Journal of Science Education, 2015
Current science education reform efforts emphasize teaching K-12 science using hands-on, inquiry activities. For maximum learning and probability of implementation among inservice teachers, these strategies must be modeled in college science courses for preservice teachers. About a decade ago, Morehead State University revised their science…
Descriptors: Item Response Theory, Multiple Choice Tests, Test Construction, Psychometrics
Thurlow, Martha L.; Lazarus, Sheryl S.; Hodgson, Jennifer R. – Journal of Special Education Leadership, 2012
The read-aloud accommodation is one of the most frequently used accommodations. Many educators need training to more confidently select, implement, and evaluate the use of the read-aloud accommodation. Planning by special education leaders can help ensure that test day goes smoothly for students who need the read-aloud accommodation.
Descriptors: Educational Assessment, Difficulty Level, Reading Difficulties, Testing Problems
El Masri, Yasmine H.; Baird, Jo-Anne; Graesser, Art – Assessment in Education: Principles, Policy & Practice, 2016
We investigate the extent to which language versions (English, French and Arabic) of the same science test are comparable in terms of item difficulty and demands. We argue that language is an inextricable part of the scientific literacy construct, be it intended or not by the examiner. This argument has considerable implications on methodologies…
Descriptors: International Assessment, Difficulty Level, Test Items, Language Variation
Zavitkovsky, Paul; Roarty, Denis; Swanson, Jason – Online Submission, 2016
This study clarifies achievement trends that occurred under NCLB and explains why NCLB reporting practices made those trends so hard to see. It concludes by describing important contributions that new PARCC exams can make and warns of new reporting problems that threaten to squander those contributions before they see the light of day.
Descriptors: Educational Legislation, Federal Legislation, Standardized Tests, Academic Achievement

Peer reviewed
Direct link
