Publication Date
In 2025 | 0 |
Since 2024 | 2 |
Since 2021 (last 5 years) | 6 |
Since 2016 (last 10 years) | 19 |
Since 2006 (last 20 years) | 31 |
Descriptor
Item Analysis | 37 |
Test Items | 37 |
Validity | 37 |
Test Construction | 16 |
Foreign Countries | 14 |
Reliability | 14 |
Statistical Analysis | 10 |
Difficulty Level | 8 |
Evaluation Methods | 7 |
Factor Analysis | 7 |
Psychometrics | 7 |
More ▼ |
Source
Author
Acar, Tülin | 1 |
Ali Türkdogan | 1 |
Angeli V. Collano | 1 |
Anna Clarissa D. Aves | 1 |
Asli Görgülü Ari | 1 |
Bax, Stephen | 1 |
Beauchamp, David | 1 |
Birol, Gülnur | 1 |
Breakstone, Joel | 1 |
Burts, Diane C. | 1 |
Chen, Chen-Su | 1 |
More ▼ |
Publication Type
Journal Articles | 34 |
Reports - Research | 30 |
Reports - Descriptive | 4 |
Tests/Questionnaires | 2 |
Numerical/Quantitative Data | 1 |
Opinion Papers | 1 |
Reports - Evaluative | 1 |
Reports - General | 1 |
Speeches/Meeting Papers | 1 |
Education Level
Secondary Education | 10 |
Elementary Education | 9 |
Higher Education | 8 |
Postsecondary Education | 8 |
High Schools | 4 |
Middle Schools | 4 |
Grade 5 | 3 |
Intermediate Grades | 3 |
Junior High Schools | 3 |
Grade 3 | 2 |
Grade 6 | 2 |
More ▼ |
Audience
Location
Turkey | 3 |
Canada | 2 |
Australia | 1 |
Germany | 1 |
Hong Kong | 1 |
Idaho | 1 |
Malaysia | 1 |
Massachusetts | 1 |
Missouri | 1 |
New York | 1 |
North Carolina (Charlotte) | 1 |
More ▼ |
Laws, Policies, & Programs
No Child Left Behind Act 2001 | 1 |
Assessments and Surveys
Trends in International… | 2 |
Eysenck Personality Inventory | 1 |
Flesch Kincaid Grade Level… | 1 |
Graduate Record Examinations | 1 |
International English… | 1 |
National Longitudinal Study… | 1 |
SAT (College Admission Test) | 1 |
What Works Clearinghouse Rating
Lewis, Jennifer; Lim, Hwanggyu; Padellaro, Frank; Sireci, Stephen G.; Zenisky, April L. – Educational Measurement: Issues and Practice, 2022
Setting cut scores on (MSTs) is difficult, particularly when the test spans several grade levels, and the selection of items from MST panels must reflect the operational test specifications. In this study, we describe, illustrate, and evaluate three methods for mapping panelists' Angoff ratings into cut scores on the scale underlying an MST. The…
Descriptors: Cutting Scores, Adaptive Testing, Test Items, Item Analysis
Kevser Arslan; Asli Görgülü Ari – Shanlax International Journal of Education, 2024
This study aimed to develop a valid and reliable multiple-choice achievement test for the subject area of ecology. The study was conducted within the framework of exploratory sequential design based on mixed research methods, and the study group consisted of a total of 250 middle school students studying at the sixth and seventh grade level. In…
Descriptors: Ecology, Science Tests, Test Construction, Multiple Choice Tests
Koziol, Natalie A.; Goodrich, J. Marc; Yoon, HyeonJin – Educational and Psychological Measurement, 2022
Differential item functioning (DIF) is often used to examine validity evidence of alternate form test accommodations. Unfortunately, traditional approaches for evaluating DIF are prone to selection bias. This article proposes a novel DIF framework that capitalizes on regression discontinuity design analysis to control for selection bias. A…
Descriptors: Regression (Statistics), Item Analysis, Validity, Testing Accommodations
Ali Türkdogan – Online Submission, 2023
This study was carried out in order to determine how the 3rd grade students of the Department of Elementary Mathematics Education structured their "if and only if propositions". The data were obtained by examining the students' answers given to the midterm exam questions and discussing the solutions with the students in the classroom.…
Descriptors: Mathematics Instruction, Teaching Methods, Difficulty Level, Questioning Techniques
Deribo, Tobias; Goldhammer, Frank; Kroehne, Ulf – Educational and Psychological Measurement, 2023
As researchers in the social sciences, we are often interested in studying not directly observable constructs through assessments and questionnaires. But even in a well-designed and well-implemented study, rapid-guessing behavior may occur. Under rapid-guessing behavior, a task is skimmed shortly but not read and engaged with in-depth. Hence, a…
Descriptors: Reaction Time, Guessing (Tests), Behavior Patterns, Bias
Gio Jay B. Aligway; Jo C. Delos Angeles; Angeli V. Collano; Eljoy P. Barroca; Anna Clarissa D. Aves; Juneflor F. Catubay; Jennifer T. Edjec; Ma. Diana A. Butaya; Sylvester T. Cortes – Journal of Biological Education Indonesia (Jurnal Pendidikan Biologi Indonesia), 2024
Biology education plays a vital role in nurturing the understanding of learners about the intricacy of life. Various efforts have emerged to strengthen learning biological concepts but there were still studies that showed that learners have low mastery in some aspects. To determine how well students understood various biological topics, including…
Descriptors: Validity, Reliability, Taxonomy, Concept Formation
Smith, Mark; Breakstone, Joel; Wineburg, Sam – Cognition and Instruction, 2019
This article reports a validity study of History Assessments of Thinking (HATs), which are short, constructed-response assessments of historical thinking. In particular, this study focuses on aspects of cognitive validity, which is an examination of whether assessments tap the intended constructs. Think-aloud interviews with 26 high school…
Descriptors: History, History Instruction, Thinking Skills, Multiple Choice Tests
Davenport, Ernest C.; Davison, Mark L.; Liou, Pey-Yan; Love, Quintin U. – Educational Measurement: Issues and Practice, 2016
The main points of Sijtsma and Green and Yang in Educational Measurement: Issues and Practice (34, 4) are that reliability, internal consistency, and unidimensionality are distinct and that Cronbach's alpha may be problematic. Neither of these assertions are at odds with Davenport, Davison, Liou, and Love in the same issue. However, many authors…
Descriptors: Educational Assessment, Reliability, Validity, Test Construction
Chiavaroli, Neville – Practical Assessment, Research & Evaluation, 2017
Despite the majority of MCQ writing guides discouraging the use of negatively-worded multiple choice questions (NWQs), they continue to be regularly used both in locally produced examinations and commercially available questions. There are several reasons why the use of NWQs may prove resistant to sound pedagogical advice. Nevertheless, systematic…
Descriptors: Multiple Choice Tests, Test Construction, Test Items, Validity
Beauchamp, David; Constantinou, Filio – Research Matters, 2020
Assessment is a useful process as it provides various stakeholders (e.g., teachers, parents, government, employers) with information about students' competence in a particular subject area. However, for the information generated by assessment to be useful, it needs to support valid inferences. One factor that can undermine the validity of…
Descriptors: Computational Linguistics, Inferences, Validity, Language Usage
Jølle, Lennart; Skar, Gustaf B. – Scandinavian Journal of Educational Research, 2020
This paper reports findings from a project called "The National Panel of Raters" (NPR) that took place within a writing test programme in Norway (2010-2016). A recent research project found individual differences between the raters in the NPR. This paper reports results from an explorative follow up-study where 63 NPR members were…
Descriptors: Foreign Countries, Validity, Scoring, Program Descriptions
Jaikaran-Doe, Seeta; Doe, Peter Edward – Australian Educational Computing, 2015
A number of validated survey instruments for assessing technological pedagogical content knowledge (TPACK) do not accurately discriminate between the seven elements of the TPACK framework particularly technological content knowledge (TCK) and technological pedagogical knowledge (TPK). By posing simple questions that assess technological,…
Descriptors: Technological Literacy, Pedagogical Content Knowledge, Surveys, Evaluation Methods
Margolis, Melissa J.; Mee, Janet; Clauser, Brian E.; Winward, Marcia; Clauser, Jerome C. – Educational Measurement: Issues and Practice, 2016
Evidence to support the credibility of standard setting procedures is a critical part of the validity argument for decisions made based on tests that are used for classification. One area in which there has been limited empirical study is the impact of standard setting judge selection on the resulting cut score. One important issue related to…
Descriptors: Academic Standards, Standard Setting (Scoring), Cutting Scores, Credibility
Sener, Nilay; Tas, Erol – Journal of Education and Learning, 2017
The purpose of this study is to prepare a multiple-choice achievement test with high reliability and validity for the "Let's Solve the Puzzle of Our Body" unit. For this purpose, a multiple choice achievement test consisting of 46 items was applied to 178 fifth grade students in total. As a result of the test and material analysis…
Descriptors: Achievement Tests, Grade 5, Science Instruction, Biology
Hitchcock, John H.; Johanson, George A. – Research in the Schools, 2015
Understanding the reason(s) for Differential Item Functioning (DIF) in the context of measurement is difficult. Although identifying potential DIF items is typically a statistical endeavor, understanding the reasons for DIF (and item repair or replacement) might require investigations that can be informed by qualitative work. Such work is…
Descriptors: Mixed Methods Research, Test Items, Item Analysis, Measurement