Publication Date
In 2025 | 0 |
Since 2024 | 1 |
Since 2021 (last 5 years) | 10 |
Since 2016 (last 10 years) | 27 |
Since 2006 (last 20 years) | 37 |
Descriptor
Difficulty Level | 53 |
Scores | 53 |
Test Reliability | 53 |
Test Items | 42 |
Test Validity | 20 |
Foreign Countries | 18 |
Test Construction | 16 |
Multiple Choice Tests | 14 |
Correlation | 12 |
Item Analysis | 11 |
Achievement Tests | 10 |
More ▼ |
Source
Author
Gallas, Edwin J. | 2 |
Metsämuuronen, Jari | 2 |
Pollock, Steven J. | 2 |
Ali, Syed Haris | 1 |
Alpayar, Cagla | 1 |
Asikainen, Mervi A. | 1 |
Barak, Moshe | 1 |
Bers, Marina Umaschi | 1 |
Bob delMas | 1 |
Brad Hartlaub | 1 |
Bristow, M. | 1 |
More ▼ |
Publication Type
Education Level
Audience
Researchers | 2 |
Location
Turkey | 5 |
Australia | 2 |
Canada | 2 |
Colorado | 2 |
Israel | 2 |
South Korea | 2 |
United States | 2 |
Asia | 1 |
Brazil | 1 |
California | 1 |
China | 1 |
More ▼ |
Laws, Policies, & Programs
Elementary and Secondary… | 1 |
Assessments and Surveys
What Works Clearinghouse Rating
Metsämuuronen, Jari – Practical Assessment, Research & Evaluation, 2023
Traditional estimators of reliability such as coefficients alpha, theta, omega, and rho (maximal reliability) are prone to give radical underestimates of reliability for the tests common when testing educational achievement. These tests are often structured by widely deviating item difficulties. This is a typical pattern where the traditional…
Descriptors: Test Reliability, Achievement Tests, Computation, Test Items
Hojung Kim; Changkyung Song; Jiyoung Kim; Hyeyun Jeong; Jisoo Park – Language Testing in Asia, 2024
This study presents a modified version of the Korean Elicited Imitation (EI) test, designed to resemble natural spoken language, and validates its reliability as a measure of proficiency. The study assesses the correlation between average test scores and Test of Proficiency in Korean (TOPIK) levels, examining score distributions among beginner,…
Descriptors: Korean, Test Validity, Test Reliability, Imitation
Chakrabartty, Satyendra Nath – International Journal of Psychology and Educational Studies, 2021
The paper proposes new measures of difficulty and discriminating values of binary items and test consisting of such items and find their relationships including estimation of test error variance and thereby the test reliability, as per definition using cosine similarities. The measures use entire data. Difficulty value of test and item is defined…
Descriptors: Test Items, Difficulty Level, Scores, Test Reliability
Thompson, Kathryn N. – ProQuest LLC, 2023
It is imperative to collect validity evidence prior to interpreting and using test scores. During the process of collecting validity evidence, test developers should consider whether test scores are contaminated by sources of extraneous information. This is referred to as construct irrelevant variance, or the "degree to which test scores are…
Descriptors: Test Wiseness, Test Items, Item Response Theory, Scores
Metsämuuronen, Jari – International Journal of Educational Methodology, 2020
Pearson product-moment correlation coefficient between item g and test score X, known as item-test or item-total correlation ("Rit"), and item-rest correlation ("Rir") are two of the most used classical estimators for item discrimination power (IDP). Both "Rit" and "Rir" underestimate IDP caused by the…
Descriptors: Correlation, Test Items, Scores, Difficulty Level
Rodriguez, Rebekah M.; Silvia, Paul J.; Kaufman, James C.; Reiter-Palmon, Roni; Puryear, Jeb S. – Creativity Research Journal, 2023
The original 90-item Creative Behavior Inventory (CBI) was a landmark self-report scale in creativity research, and the 28-item brief form developed nearly 20 years ago continues to be a popular measure of everyday creativity. Relatively little is known, however, about the psychometric properties of this widely used scale. In the current research,…
Descriptors: Creativity Tests, Creativity, Creative Thinking, Psychometrics
Tim Jacobbe; Bob delMas; Brad Hartlaub; Jeff Haberstroh; Catherine Case; Steven Foti; Douglas Whitaker – Numeracy, 2023
The development of assessments as part of the funded LOCUS project is described. The assessments measure students' conceptual understanding of statistics as outlined in the GAISE PreK-12 Framework. Results are reported from a large-scale administration to 3,430 students in grades 6 through 12 in the United States. Items were designed to assess…
Descriptors: Statistics Education, Common Core State Standards, Student Evaluation, Elementary School Students
Slepkov, A. D.; Van Bussel, M. L.; Fitze, K. M.; Burr, W. S. – SAGE Open, 2021
There is a broad literature in multiple-choice test development, both in terms of item-writing guidelines, and psychometric functionality as a measurement tool. However, most of the published literature concerns multiple-choice testing in the context of expert-designed high-stakes standardized assessments, with little attention being paid to the…
Descriptors: Foreign Countries, Undergraduate Students, Student Evaluation, Multiple Choice Tests
Saepuzaman, Duden; Istiyono, Edi; Haryanto – Pegem Journal of Education and Instruction, 2022
HOTS is one part of the skills that need to be developed in the 21st Century . This study aims to determine the characteristics of the Fundamental Physics Higher-order Thinking Skill (FundPhysHOTS) test for prospective physics teachers using Item Response Theory (IRT) analysis. This study uses a quantitative approach. 254 prospective physics…
Descriptors: Thinking Skills, Physics, Science Process Skills, Cognitive Tests
Mohammed Ambusaidi – ProQuest LLC, 2022
There is an increased demand on nursing faculty to provide quality teaching and assessment. Nursing faculty are required to ensure accurate assessment of learning through testing and outcome measurement that are critical elements of the evaluation process. Likewise, nursing faculty should implement a logical evaluation system. However, the…
Descriptors: Nursing Education, College Faculty, Test Construction, Test Validity
Guven Demir, Elif; Öksuz, Yücel – Participatory Educational Research, 2022
This research aimed to investigate animation-based achievement tests according to the item format, psychometric features, students' performance, and gender. The study sample consisted of 52 fifth-grade students in Samsun/Turkey in 2017-2018. Measures of the research were open-ended (OE), animation-based open-ended (AOE), multiple-choice (MC), and…
Descriptors: Animation, Achievement Tests, Test Items, Psychometrics
Guo, Hongwen; Zu, Jiyun; Kyllonen, Patrick – ETS Research Report Series, 2018
For a multiple-choice test under development or redesign, it is important to choose the optimal number of options per item so that the test possesses the desired psychometric properties. On the basis of available data for a multiple-choice assessment with 8 options, we evaluated the effects of changing the number of options on test properties…
Descriptors: Multiple Choice Tests, Test Items, Simulation, Test Construction
Wang, Xiaolin; Svetina, Dubravka; Dai, Shenghai – Journal of Experimental Education, 2019
Recently, interest in test subscore reporting for diagnosis purposes has been growing rapidly. The two simulation studies here examined factors (sample size, number of subscales, correlation between subscales, and three factors affecting subscore reliability: number of items per subscale, item parameter distribution, and data generating model)…
Descriptors: Value Added Models, Scores, Sample Size, Correlation
Yuksel, Ibrahim; Savas, Muhammed Ali – Asian Journal of Education and Training, 2019
In this research, it is aimed to develop a valid and reliable test to determine the drawing a shape-schema and making a table levels of prospective teachers at Mathematics and Science Education, Turkish and Social Sciences Education and Basic Education Departments. In this process, a comprehensive item pool has been prepared with the table of…
Descriptors: Preservice Teachers, Item Banks, Test Validity, Foreign Countries
Yao, Yuan – ProQuest LLC, 2019
Under the framework of item response theory (IRT) and generalizability (G-) theory, this study examined the effects of item difficulty on rating reliability and construct validity on both the constructed-response (CR) items and essay items on English examinations. The data collected for this study were students' scores and responses on the two…
Descriptors: Foreign Countries, College Students, Second Language Learning, English (Second Language)