Publication Date
In 2025 | 1 |
Since 2024 | 4 |
Since 2021 (last 5 years) | 16 |
Since 2016 (last 10 years) | 39 |
Since 2006 (last 20 years) | 94 |
Descriptor
Test Reliability | 252 |
Test Theory | 252 |
Test Validity | 109 |
Test Construction | 64 |
Test Items | 51 |
Item Response Theory | 44 |
Scores | 44 |
Error of Measurement | 43 |
Psychometrics | 41 |
Item Analysis | 38 |
Statistical Analysis | 38 |
More ▼ |
Source
Author
Zimmerman, Donald W. | 9 |
Haladyna, Tom | 4 |
Huynh, Huynh | 4 |
Williams, Richard H. | 4 |
Mislevy, Robert J. | 3 |
Wilcox, Rand R. | 3 |
Bormuth, John R. | 2 |
Brennan, Robert L. | 2 |
Cliff, Norman | 2 |
Crocker, Linda | 2 |
Crowley, Susan L. | 2 |
More ▼ |
Publication Type
Education Level
Audience
Practitioners | 10 |
Researchers | 7 |
Teachers | 7 |
Administrators | 2 |
Students | 2 |
Location
United Kingdom (England) | 5 |
Canada | 4 |
Spain | 3 |
Australia | 2 |
Singapore | 2 |
Texas | 2 |
Turkey | 2 |
Turkey (Ankara) | 2 |
United Kingdom (Great Britain) | 2 |
United States | 2 |
Chile | 1 |
More ▼ |
Laws, Policies, & Programs
Elementary and Secondary… | 1 |
Assessments and Surveys
What Works Clearinghouse Rating
Osman Tat; Abdullah Faruk Kilic – Turkish Online Journal of Distance Education, 2024
The widespread availability of internet access in daily life has resulted in a greater acceptance of online assessment methods. E-assessment platforms offer various features such as randomizing questions and answers, utilizing extensive question banks, setting time limits, and managing access during online exams. Electronic assessment enables…
Descriptors: Test Construction, Test Validity, Test Reliability, Anxiety
Comparison of the Results of the Generalizability Theory with the Inter-Rater Agreement Coefficients
Eser, Mehmet Taha; Aksu, Gökhan – International Journal of Curriculum and Instruction, 2022
The agreement between raters is examined within the scope of the concept of "inter-rater reliability". Although there are clear definitions of the concepts of agreement between raters and reliability between raters, there is no clear information about the conditions under which agreement and reliability level methods are appropriate to…
Descriptors: Generalizability Theory, Interrater Reliability, Evaluation Methods, Test Theory
Chakrabartty, Satyendra Nath – International Journal of Psychology and Educational Studies, 2021
The paper proposes new measures of difficulty and discriminating values of binary items and test consisting of such items and find their relationships including estimation of test error variance and thereby the test reliability, as per definition using cosine similarities. The measures use entire data. Difficulty value of test and item is defined…
Descriptors: Test Items, Difficulty Level, Scores, Test Reliability
Xiao, Leifeng; Hau, Kit-Tai – Applied Measurement in Education, 2023
We compared coefficient alpha with five alternatives (omega total, omega RT, omega h, GLB, and coefficient H) in two simulation studies. Results showed for unidimensional scales, (a) all indices except omega h performed similarly well for most conditions; (b) alpha is still good; (c) GLB and coefficient H overestimated reliability with small…
Descriptors: Test Theory, Test Reliability, Factor Analysis, Test Length
Gayle Geschwind; Michael Vignal; Marcos D. Caballero; H.? J. Lewandowski – Physical Review Physics Education Research, 2024
The Survey of Physics Reasoning on Uncertainty Concepts in Experiments (SPRUCE) was designed to measure students' proficiency with measurement uncertainty concepts and practices across ten different assessment objectives to help facilitate the improvement of laboratory instruction focused on this important topic. To ensure the reliability and…
Descriptors: Measurement, Ambiguity (Context), Scientific Concepts, Physics
Using Differential Item Functioning to Test for Interrater Reliability in Constructed Response Items
Walker, Cindy M.; Göçer Sahin, Sakine – Educational and Psychological Measurement, 2020
The purpose of this study was to investigate a new way of evaluating interrater reliability that can allow one to determine if two raters differ with respect to their rating on a polytomous rating scale or constructed response item. Specifically, differential item functioning (DIF) analyses were used to assess interrater reliability and compared…
Descriptors: Test Bias, Interrater Reliability, Responses, Correlation
Nicolas Rochat; Laurent Lima; Pascal Bressoux – Journal of Psychoeducational Assessment, 2025
Inference is considered an important factor in comprehension models and has been described as a causal factor in predicting comprehension. To date, specific tests for inference are rare and often rely on specific thematic texts. This reliance on thematic inference may raise some concerns as inference is related to prior text-specific knowledge.…
Descriptors: Inferences, Reading Comprehension, Reading Tests, Test Reliability
Ser Ming Mark Lee; Wei Cheng Liu – Asia Pacific Journal of Education, 2024
Programme evaluation has developed tremendously over the past 50 years, with a proliferation of evaluation research, an increase in the institutionalization of evaluation, and growth in the professionalization of evaluation. However, existing research and developments are still largely in North America, Europe, Australia, and New Zealand, with…
Descriptors: Foreign Countries, Evaluation Research, Evaluation Methods, Evaluation Criteria
Chalmers, R. Philip – Educational and Psychological Measurement, 2018
This article discusses the theoretical and practical contributions of Zumbo, Gadermann, and Zeisser's family of ordinal reliability statistics. Implications, interpretation, recommendations, and practical applications regarding their ordinal measures, particularly ordinal alpha, are discussed. General misconceptions relating to this family of…
Descriptors: Misconceptions, Test Theory, Test Reliability, Statistics
Ellis, Jules L. – Educational and Psychological Measurement, 2021
This study develops a theoretical model for the costs of an exam as a function of its duration. Two kind of costs are distinguished: (1) the costs of measurement errors and (2) the costs of the measurement. Both costs are expressed in time of the student. Based on a classical test theory model, enriched with assumptions on the context, the costs…
Descriptors: Test Length, Models, Error of Measurement, Measurement
Tyrone B. Pretorius; P. Paul Heppner; Anita Padmanabhanunni; Serena Ann Isaacs – SAGE Open, 2023
In previous studies, problem solving appraisal has been identified as playing a key role in promoting positive psychological well-being. The Problem Solving Inventory is the most widely used measure of problem solving appraisal and consists of 32 items. The length of the instrument, however, may limit its applicability to large-scale surveys…
Descriptors: Problem Solving, Measures (Individuals), Test Construction, Item Response Theory
Koçak, Duygu – International Journal of Progressive Education, 2020
The aim of this study was to determine the effect of chance success on test equalization. For this purpose, artificially generated 500 and 1000 sample size data sets were synchronized using linear equalization and equal percentage equalization methods. In the data which were produced as a simulative, a total of four cases were created with no…
Descriptors: Test Theory, Equated Scores, Error of Measurement, Sample Size
Hancock, Gregory R.; An, Ji – Measurement: Interdisciplinary Research and Perspectives, 2020
As an alternative to Cronbach's [alpha] for estimating scale reliability, McDonald's [omega] has attracted increased attention within the methodological community for its less stringent measurement assumptions. Notwithstanding, [omega] is still seldom used by practitioners, likely due to its unavailability in popular software packages (e.g., SPSS)…
Descriptors: Evaluation, Alternative Assessment, Reliability, Test Reliability
Huebner, Alan; Skar, Gustaf B. – Practical Assessment, Research & Evaluation, 2021
Writing assessments often consist of students responding to multiple prompts, which are judged by more than one rater. To establish the reliability of these assessments, there exist different methods to disentangle variation due to prompts and raters, including classical test theory, Many Facet Rasch Measurement (MFRM), and Generalizability Theory…
Descriptors: Error of Measurement, Test Theory, Generalizability Theory, Item Response Theory
Parker, Mark A. J.; Hedgeland, Holly; Jordan, Sally E.; Braithwaite, Nicholas St. J. – European Journal of Science and Mathematics Education, 2023
The study covers the development and testing of the alternative mechanics survey (AMS), a modified force concept inventory (FCI), which used automatically marked free-response questions. Data were collected over a period of three academic years from 611 participants who were taking physics classes at high school and university level. A total of…
Descriptors: Test Construction, Scientific Concepts, Physics, Test Reliability