Publication Date
| In 2026 | 0 |
| Since 2025 | 1 |
| Since 2022 (last 5 years) | 6 |
| Since 2017 (last 10 years) | 24 |
| Since 2007 (last 20 years) | 48 |
Descriptor
| Correlation | 64 |
| Difficulty Level | 64 |
| Test Reliability | 40 |
| Test Items | 37 |
| Foreign Countries | 27 |
| Test Validity | 19 |
| Reliability | 17 |
| Item Analysis | 16 |
| Scores | 16 |
| Comparative Analysis | 15 |
| Statistical Analysis | 15 |
| More ▼ | |
Source
Author
| Hamby, Tyler | 2 |
| Abbasnejad, Hannaheh | 1 |
| Aktas, Elif | 1 |
| Alsma, Jelmer | 1 |
| Anatri Desstya | 1 |
| Anderson, Paul S. | 1 |
| Anwyll, Steve | 1 |
| Arth, Thomas O. | 1 |
| Attali, Yigal | 1 |
| Benson, Jeri | 1 |
| Bethscheider, Janine K. | 1 |
| More ▼ | |
Publication Type
Education Level
Audience
| Practitioners | 1 |
| Researchers | 1 |
| Teachers | 1 |
Location
| Germany | 6 |
| Japan | 3 |
| Turkey | 3 |
| California | 2 |
| Canada | 2 |
| Indonesia | 2 |
| Netherlands | 2 |
| Portugal | 2 |
| United Kingdom (England) | 2 |
| Asia | 1 |
| Australia | 1 |
| More ▼ | |
Laws, Policies, & Programs
Assessments and Surveys
What Works Clearinghouse Rating
Metsämuuronen, Jari – International Journal of Educational Methodology, 2020
Pearson product-moment correlation coefficient between item g and test score X, known as item-test or item-total correlation ("Rit"), and item-rest correlation ("Rir") are two of the most used classical estimators for item discrimination power (IDP). Both "Rit" and "Rir" underestimate IDP caused by the…
Descriptors: Correlation, Test Items, Scores, Difficulty Level
Anatri Desstya; Ika Candra Sayekti; Muhammad Abduh; Sukartono – Journal of Turkish Science Education, 2025
This study aimed to develop a standardised instrument for diagnosing science misconceptions in primary school children. Following a developmental research approach using the 4-D model (Define, Design, Develop, Disseminate), 100 four-tier multiple choice items were constructed. Content validity was established through expert evaluation by six…
Descriptors: Test Construction, Science Tests, Science Instruction, Diagnostic Tests
Ferrari-Bridgers, Franca – International Journal of Listening, 2023
While many tools exist to assess student content knowledge, there are few that assess whether students display the critical listening skills necessary to interpret the quality of a speaker's message at the college level. The following research provides preliminary evidence for the internal consistency and factor structure of a tool, the…
Descriptors: Factor Structure, Test Validity, Community College Students, Test Reliability
Slepkov, A. D.; Van Bussel, M. L.; Fitze, K. M.; Burr, W. S. – SAGE Open, 2021
There is a broad literature in multiple-choice test development, both in terms of item-writing guidelines, and psychometric functionality as a measurement tool. However, most of the published literature concerns multiple-choice testing in the context of expert-designed high-stakes standardized assessments, with little attention being paid to the…
Descriptors: Foreign Countries, Undergraduate Students, Student Evaluation, Multiple Choice Tests
Hartono, Wahyu; Hadi, Samsul; Rosnawati, Raden; Retnawati, Heri – Pegem Journal of Education and Instruction, 2023
Researchers design diagnostic assessments to measure students' knowledge structures and processing skills to provide information about their cognitive attribute. The purpose of this study is to determine the instrument's validity and score reliability, as well as to investigate the use of classical test theory to identify item characteristics. The…
Descriptors: Diagnostic Tests, Test Validity, Item Response Theory, Content Validity
Mohammed Ambusaidi – ProQuest LLC, 2022
There is an increased demand on nursing faculty to provide quality teaching and assessment. Nursing faculty are required to ensure accurate assessment of learning through testing and outcome measurement that are critical elements of the evaluation process. Likewise, nursing faculty should implement a logical evaluation system. However, the…
Descriptors: Nursing Education, College Faculty, Test Construction, Test Validity
Montes, L. H.; Ferreira, R. A.; Rodríguez, C. – Chemistry Education Research and Practice, 2022
Attitudes towards learning chemistry have been little studied in secondary school students, especially regarding dimensions related to problem solving, the molecular atomic perspective of chemistry, and real-world connection of chemistry. In the present study, we first aimed to design and assess the psychometric properties of the attitude to…
Descriptors: Student Attitudes, Secondary School Students, Science Instruction, Chemistry
Güngör, Burcu; Önder, Alev – Early Education and Development, 2023
Research Findings: The aim of this study was to construct and validate "English Picture Vocabulary Test (EPVT)" that aimed to assess the very young learners' (VYLs) receptive and expressive vocabulary knowledge for specific content areas in English as a foreign language (EFL). In this context, EPVT was created in several stages. One of…
Descriptors: Language Tests, Test Construction, English (Second Language), Second Language Learning
Hamby, Tyler – Journal of Psychoeducational Assessment, 2018
In this study, the author examined potential mediators of the negative relationship between the absolute difference in items' lengths and their inter-item correlation size. Fifty-two randomly ordered items from five personality scales were administered to 622 university students, and 46 respondents from a survey website rated the items'…
Descriptors: Correlation, Personality Traits, Undergraduate Students, Difficulty Level
Wang, Xiaolin; Svetina, Dubravka; Dai, Shenghai – Journal of Experimental Education, 2019
Recently, interest in test subscore reporting for diagnosis purposes has been growing rapidly. The two simulation studies here examined factors (sample size, number of subscales, correlation between subscales, and three factors affecting subscore reliability: number of items per subscale, item parameter distribution, and data generating model)…
Descriptors: Value Added Models, Scores, Sample Size, Correlation
Hidri, Sahbi – Language Testing in Asia, 2021
The study investigated the alignment process of the International English Language Competency Assessment (IELCA) suite examinations' four levels, B1, B2, C1 and C2, onto the Common European Framework of Reference (CEFR) by explaining and discussing the five linking stages (Council of Europe (CoE 2009). Unlike previous studies, this study used the…
Descriptors: Literacy, Second Language Learning, Second Language Instruction, English (Second Language)
Papenberg, Martin; Musch, Jochen – Applied Measurement in Education, 2017
In multiple-choice tests, the quality of distractors may be more important than their number. We therefore examined the joint influence of distractor quality and quantity on test functioning by providing a sample of 5,793 participants with five parallel test sets consisting of items that differed in the number and quality of distractors.…
Descriptors: Multiple Choice Tests, Test Items, Test Validity, Test Reliability
Xie, Qin – Language Assessment Quarterly, 2020
This article describes the steps we went through in designing and validating an item bank to diagnose linguistic problems in the English academic writing of university students in Hong Kong. Test items adopt traditional item formats (e.g., MCQ, grammatical judgment tasks, and error correction) but are based on authentic language materials…
Descriptors: English for Academic Purposes, Second Language Learning, Second Language Instruction, Item Analysis
Mehren, Rainer; Rempfler, Armin; Buchholz, Janine; Hartig, Johannes; Ulrich-Riedhammer, Eva M. – Journal of Research in Science Teaching, 2018
Constituting a metacognitive strategy, system competence or systems thinking can only assume its assigned key function as a basic concept for the school subject of geography in Germany after a theoretical and empirical foundation has been established. A measurement instrument is required which is suitable both for supporting students and for the…
Descriptors: Models, Metacognition, Competence, Geography
Samimi, Parnia; Ravana, Sri Devi; Webber, William; Koh, Yun Sing – Information Research: An International Electronic Journal, 2017
Introduction: Despite the popularity of crowdsourcing, the reliability of crowdsourced output has been questioned since crowdsourced workers display varied degrees of attention, ability and accuracy. It is important, therefore, to understand the factors that affect the reliability of crowdsourcing. In the context of producing relevance judgments,…
Descriptors: Reliability, Value Judgment, Competence, Individual Characteristics

Peer reviewed
Direct link
