Publication Date
In 2025 | 25 |
Since 2024 | 49 |
Since 2021 (last 5 years) | 169 |
Since 2016 (last 10 years) | 341 |
Since 2006 (last 20 years) | 477 |
Descriptor
Foreign Countries | 534 |
Test Items | 534 |
Test Validity | 388 |
Test Reliability | 253 |
Test Construction | 243 |
Difficulty Level | 117 |
Item Analysis | 116 |
Factor Analysis | 110 |
Psychometrics | 100 |
Item Response Theory | 99 |
English (Second Language) | 96 |
More ▼ |
Source
Author
Baghaei, Purya | 4 |
Brown, James Dean | 3 |
Goldhammer, Frank | 3 |
Prasetyo, Zuhdan Kun | 3 |
Altun, Halis | 2 |
Beglar, David | 2 |
Berberoglu, Giray | 2 |
Brown, Ted | 2 |
Bulut, Okan | 2 |
Che Lah, Noor Hidayah | 2 |
Chen, Yi-Hsin | 2 |
More ▼ |
Publication Type
Education Level
Audience
Teachers | 2 |
Practitioners | 1 |
Researchers | 1 |
Location
Turkey | 82 |
Canada | 38 |
Indonesia | 38 |
Germany | 30 |
Iran | 26 |
Australia | 24 |
China | 22 |
United Kingdom | 18 |
Netherlands | 15 |
Taiwan | 15 |
Japan | 14 |
More ▼ |
Laws, Policies, & Programs
Individuals with Disabilities… | 1 |
No Child Left Behind Act 2001 | 1 |
United Nations Convention on… | 1 |
Assessments and Surveys
What Works Clearinghouse Rating
Shadi Noroozi; Hossein Karami – Language Testing in Asia, 2024
Recently, psychometricians and researchers have voiced their concern over the exploration of language test items in light of Messick's validation framework. Validity has been central to test development and use; however, it has not received due attention in language tests having grave consequences for test takers. The present study sought to…
Descriptors: Foreign Countries, Doctoral Students, Graduate Students, Language Proficiency
Fadime Hatice Inci; Ferhat Çelik – Psychology in the Schools, 2025
The aim of this study is to examine the validity, reliability, and responsiveness of the Turkish version of the Adolescent Health Promotion-Short Form (AHP-SF). This cross-sectional study was completed with 1483 students. Confirmatory factor analysis (CFA) supported the construct validity of the scale, demonstrating a good model fit with…
Descriptors: Foreign Countries, Measures (Individuals), Adolescents, Health Promotion
Hartono, Wahyu; Hadi, Samsul; Rosnawati, Raden; Retnawati, Heri – Pegem Journal of Education and Instruction, 2023
Researchers design diagnostic assessments to measure students' knowledge structures and processing skills to provide information about their cognitive attribute. The purpose of this study is to determine the instrument's validity and score reliability, as well as to investigate the use of classical test theory to identify item characteristics. The…
Descriptors: Diagnostic Tests, Test Validity, Item Response Theory, Content Validity
Endang Susantini; Yurizka Melia Sari; Prima Vidya Asteria; Muhammad Ilyas Marzuqi – Journal of Education and Learning (EduLearn), 2025
Assessing preservice' higher order thinking skills (HOTS) in science and mathematics is essential. Teachers' HOTS ability is closely related to their ability to create HOTS-type science and mathematics problems. Among various types of HOTS, one is Bloomian HOTS. To facilitate the preservice teacher to create problems in those subjects, an Android…
Descriptors: Content Validity, Mathematics Instruction, Decision Making, Thinking Skills
Yi Zou; Ying Zheng; Jingwen Wang – International Journal of Language Testing, 2025
The Pearson Test of English Academic (PTE-A), a widely used high-stakes language proficiency test for university admissions and migration purposes, underwent a notable change from a three-hour to a two-hour version in November 2021. The implementation of the new version has prompted inquiries into the washback effects on various stakeholders.…
Descriptors: Testing Problems, Test Preparation, High Stakes Tests, English (Second Language)
Sherwin E. Balbuena – Online Submission, 2024
This study introduces a new chi-square test statistic for testing the equality of response frequencies among distracters in multiple-choice tests. The formula uses the information from the number of correct answers and wrong answers, which becomes the basis of calculating the expected values of response frequencies per distracter. The method was…
Descriptors: Multiple Choice Tests, Statistics, Test Validity, Testing
Kaja Haugen; Cecilie Hamnes Carlsen; Christine Möller-Omrani – Language Awareness, 2025
This article presents the process of constructing and validating a test of metalinguistic awareness (MLA) for young school children (age 8-10). The test was developed between 2021 and 2023 as part of the MetaLearn research project, financed by The Research Council of Norway. The research team defines MLA as using metalinguistic knowledge at a…
Descriptors: Language Tests, Test Construction, Elementary School Students, Metalinguistics
David Bell; Vikki O'Neill; Vivienne Crawford – Practitioner Research in Higher Education, 2023
We compared the influence of open-book extended duration versus closed book time-limited format on reliability and validity of written assessments of pharmacology learning outcomes within our medical and dental courses. Our dental cohort undertake a mid-year test (30xfree-response short answer to a question, SAQ) and end-of-year paper (4xSAQ,…
Descriptors: Undergraduate Students, Pharmacology, Pharmaceutical Education, Test Format
Maryani, Ika; Prasetyo, Zuhdan Kun; Wilujeng, Insih; Purwanti, Siwi – Pegem Journal of Education and Instruction, 2022
The purpose of this study was to construct a higher-order thinking test of science for pre-service elementary school teachers. The test was created using the ADDIE model. The analysis stage was carried out by identifying the needs and baseline of higher-order thinking skills of students from the department of primary School Teacher education in…
Descriptors: Thinking Skills, Science Tests, Cognitive Tests, Preservice Teachers
Atakan Yalcin; Cennet Sanli; Adnan Pinar – Journal of Theoretical Educational Science, 2025
This study aimed to develop a test to measure university students' spatial thinking skills. The research was conducted using a survey design, with a sample of 260 undergraduate students from geography teaching and geography departments. GIS software was used to incorporate maps and satellite images, enhancing the spatial representation in the…
Descriptors: Spatial Ability, Thinking Skills, Geography, Undergraduate Students
Suciati; Munadi, Sudji; Sugiman; Febriyanti, Wiwin Dwi Ratna – European Journal of Educational Research, 2020
This study aims to design mathematical literacy instruments that have evidence of content and construct validity and are reliable for use as an assessment for learning. The research involved eight experts as instrument validators and 273 eighth-grade students of junior high school in Yogyakarta Province. The results showed that the ten…
Descriptors: Numeracy, Mathematics Tests, Test Construction, Test Validity
Ismail, Fouzul Kareema Mohamed; Zubairi, Ainol Madziah Bt. – English Language Teaching, 2022
This paper presents the findings of a study that intended to seek the content validity (CV) evidence of an instrument to measure the reading ability of university students in Sri Lanka. The reading passages and items were adapted from CEFR aligned Learning Resource Network (LRN) materials. The items were designed based on the cognitive processing…
Descriptors: Foreign Countries, Test Items, Content Validity, Reading Tests
Katrin Schuessler; Vanessa Fischer; Maik Walpuski – Instructional Science: An International Journal of the Learning Sciences, 2025
Cognitive load studies are mostly centered on information on perceived cognitive load. Single-item subjective rating scales are the dominant measurement practice to investigate overall cognitive load. Usually, either invested mental effort or perceived task difficulty is used as an overall cognitive load measure. However, the extent to which the…
Descriptors: Cognitive Processes, Difficulty Level, Rating Scales, Construct Validity
Paula Elosua – Language Assessment Quarterly, 2024
In sociolinguistic contexts where standardized languages coexist with regional dialects, the study of differential item functioning is a valuable tool for examining certain linguistic uses or varieties as threats to score validity. From an ecological perspective, this paper describes three stages in the study of differential item functioning…
Descriptors: Reading Tests, Reading Comprehension, Scores, Test Validity
Durak, Ismail; Karagoz, Yalcin – International Journal of Assessment Tools in Education, 2021
The aim of this study is to adapt the Statistics Anxiety Scale (SAS) developed by Vigil-Colet et al. (2008) to Turkish. This study is expected to fill an important gap in the literature since no valid and reliable specific statistics anxiety scale developed or adapted in Turkish for undergraduate students in the literature is available. The sample…
Descriptors: Foreign Countries, Affective Measures, Statistics, Mathematics Anxiety