Publication Date
In 2025 | 3 |
Since 2024 | 4 |
Since 2021 (last 5 years) | 15 |
Since 2016 (last 10 years) | 35 |
Since 2006 (last 20 years) | 62 |
Descriptor
Test Reliability | 517 |
Test Validity | 274 |
Test Construction | 118 |
Higher Education | 108 |
Factor Structure | 74 |
Correlation | 69 |
Factor Analysis | 63 |
Item Analysis | 56 |
Rating Scales | 55 |
Scores | 54 |
Test Items | 54 |
More ▼ |
Source
Educational and Psychological… | 517 |
Author
Publication Type
Education Level
Higher Education | 14 |
Postsecondary Education | 8 |
Secondary Education | 6 |
Elementary Education | 4 |
Middle Schools | 4 |
Grade 4 | 3 |
High Schools | 3 |
Junior High Schools | 3 |
Grade 3 | 2 |
Intermediate Grades | 2 |
Early Childhood Education | 1 |
More ▼ |
Audience
Location
Australia | 7 |
Canada | 6 |
Netherlands | 3 |
Taiwan | 3 |
Germany | 2 |
Jordan | 2 |
Norway | 2 |
Philippines | 2 |
Saudi Arabia | 2 |
Belgium | 1 |
Brazil | 1 |
More ▼ |
Laws, Policies, & Programs
Assessments and Surveys
What Works Clearinghouse Rating
Yongtian Cheng; K. V. Petrides – Educational and Psychological Measurement, 2025
Psychologists are emphasizing the importance of predictive conclusions. Machine learning methods, such as supervised neural networks, have been used in psychological studies as they naturally fit prediction tasks. However, we are concerned about whether neural networks fitted with random datasets (i.e., datasets where there is no relationship…
Descriptors: Psychological Studies, Artificial Intelligence, Cognitive Processes, Predictive Validity
Foster, Robert C. – Educational and Psychological Measurement, 2021
This article presents some equivalent forms of the common Kuder-Richardson Formula 21 and 20 estimators for nondichotomous data belonging to certain other exponential families, such as Poisson count data, exponential data, or geometric counts of trials until failure. Using the generalized framework of Foster (2020), an equation for the reliability…
Descriptors: Test Reliability, Data, Computation, Mathematical Formulas
Raykov, Tenko; Marcoulides, George A. – Educational and Psychological Measurement, 2021
The population discrepancy between unstandardized and standardized reliability of homogeneous multicomponent measuring instruments is examined. Within a latent variable modeling framework, it is shown that the standardized reliability coefficient for unidimensional scales can be markedly higher than the corresponding unstandardized reliability…
Descriptors: Test Reliability, Computation, Measures (Individuals), Research Problems
Hung-Yu Huang – Educational and Psychological Measurement, 2025
The use of discrete categorical formats to assess psychological traits has a long-standing tradition that is deeply embedded in item response theory models. The increasing prevalence and endorsement of computer- or web-based testing has led to greater focus on continuous response formats, which offer numerous advantages in both respondent…
Descriptors: Response Style (Tests), Psychological Characteristics, Item Response Theory, Test Reliability
Kroc, Edward; Olvera Astivia, Oscar L. – Educational and Psychological Measurement, 2022
Setting cutoff scores is one of the most common practices when using scales to aid in classification purposes. This process is usually done univariately where each optimal cutoff value is decided sequentially, subscale by subscale. While it is widely known that this process necessarily reduces the probability of "passing" such a test,…
Descriptors: Multivariate Analysis, Cutting Scores, Classification, Measurement
Joseph A. Rios; Jiayi Deng – Educational and Psychological Measurement, 2025
To mitigate the potential damaging consequences of rapid guessing (RG), a form of noneffortful responding, researchers have proposed a number of scoring approaches. The present simulation study examines the robustness of the most popular of these approaches, the unidimensional effort-moderated (EM) scoring procedure, to multidimensional RG (i.e.,…
Descriptors: Scoring, Guessing (Tests), Reaction Time, Item Response Theory
Olvera Astivia, Oscar Lorenzo; Kroc, Edward; Zumbo, Bruno D. – Educational and Psychological Measurement, 2020
Simulations concerning the distributional assumptions of coefficient alpha are contradictory. To provide a more principled theoretical framework, this article relies on the Fréchet-Hoeffding bounds, in order to showcase that the distribution of the items play a role on the estimation of correlations and covariances. More specifically, these bounds…
Descriptors: Test Items, Test Reliability, Computation, Correlation
Viola Merhof; Caroline M. Böhm; Thorsten Meiser – Educational and Psychological Measurement, 2024
Item response tree (IRTree) models are a flexible framework to control self-reported trait measurements for response styles. To this end, IRTree models decompose the responses to rating items into sub-decisions, which are assumed to be made on the basis of either the trait being measured or a response style, whereby the effects of such person…
Descriptors: Item Response Theory, Test Interpretation, Test Reliability, Test Validity
Brennan, Robert L.; Kim, Stella Y.; Lee, Won-Chan – Educational and Psychological Measurement, 2022
This article extends multivariate generalizability theory (MGT) to tests with different random-effects designs for each level of a fixed facet. There are numerous situations in which the design of a test and the resulting data structure are not definable by a single design. One example is mixed-format tests that are composed of multiple-choice and…
Descriptors: Multivariate Analysis, Generalizability Theory, Multiple Choice Tests, Test Construction
Jiang, Zhehan; Shi, Dexin; Distefano, Christine – Educational and Psychological Measurement, 2021
The costs of an objective structured clinical examination (OSCE) are of concern to health profession educators globally. As OSCEs are usually designed under generalizability theory (G-theory) framework, this article proposes a machine-learning-based approach to optimize the costs, while maintaining the minimum required generalizability…
Descriptors: Artificial Intelligence, Generalizability Theory, Objective Tests, Foreign Countries
Schulte, Niklas; Holling, Heinz; Bürkner, Paul-Christian – Educational and Psychological Measurement, 2021
Forced-choice questionnaires can prevent faking and other response biases typically associated with rating scales. However, the derived trait scores are often unreliable and ipsative, making interindividual comparisons in high-stakes situations impossible. Several studies suggest that these problems vanish if the number of measured traits is high.…
Descriptors: Questionnaires, Measurement Techniques, Test Format, Scoring
Using Differential Item Functioning to Test for Interrater Reliability in Constructed Response Items
Walker, Cindy M.; Göçer Sahin, Sakine – Educational and Psychological Measurement, 2020
The purpose of this study was to investigate a new way of evaluating interrater reliability that can allow one to determine if two raters differ with respect to their rating on a polytomous rating scale or constructed response item. Specifically, differential item functioning (DIF) analyses were used to assess interrater reliability and compared…
Descriptors: Test Bias, Interrater Reliability, Responses, Correlation
Wyse, Adam E. – Educational and Psychological Measurement, 2021
An essential question when computing test--retest and alternate forms reliability coefficients is how many days there should be between tests. This article uses data from reading and math computerized adaptive tests to explore how the number of days between tests impacts alternate forms reliability coefficients. Results suggest that the highest…
Descriptors: Computer Assisted Testing, Adaptive Testing, Test Reliability, Reading Tests
Xiao, Leifeng; Hau, Kit-Tai – Educational and Psychological Measurement, 2023
We examined the performance of coefficient alpha and its potential competitors (ordinal alpha, omega total, Revelle's omega total [omega RT], omega hierarchical [omega h], greatest lower bound [GLB], and coefficient "H") with continuous and discrete data having different types of non-normality. Results showed the estimation bias was…
Descriptors: Statistical Bias, Statistical Analysis, Likert Scales, Statistical Distributions
Chalmers, R. Philip – Educational and Psychological Measurement, 2018
This article discusses the theoretical and practical contributions of Zumbo, Gadermann, and Zeisser's family of ordinal reliability statistics. Implications, interpretation, recommendations, and practical applications regarding their ordinal measures, particularly ordinal alpha, are discussed. General misconceptions relating to this family of…
Descriptors: Misconceptions, Test Theory, Test Reliability, Statistics