Publication Date
In 2025 | 3 |
Since 2024 | 4 |
Since 2021 (last 5 years) | 15 |
Descriptor
Test Reliability | 15 |
Item Response Theory | 6 |
Test Items | 4 |
Correlation | 3 |
Error of Measurement | 3 |
Foreign Countries | 3 |
Sample Size | 3 |
Scores | 3 |
Test Construction | 3 |
Test Validity | 3 |
Artificial Intelligence | 2 |
More ▼ |
Source
Educational and Psychological… | 15 |
Author
Brennan, Robert L. | 1 |
Bürkner, Paul-Christian | 1 |
Caroline M. Böhm | 1 |
Distefano, Christine | 1 |
Ellis, Jules L. | 1 |
Flore, Paulette C. | 1 |
Foster, Robert C. | 1 |
Hau, Kit-Tai | 1 |
Holling, Heinz | 1 |
Hung-Yu Huang | 1 |
Jane Rogers, H. | 1 |
More ▼ |
Publication Type
Journal Articles | 15 |
Reports - Research | 13 |
Reports - Evaluative | 2 |
Education Level
Higher Education | 2 |
Postsecondary Education | 2 |
Secondary Education | 2 |
Elementary Secondary Education | 1 |
High Schools | 1 |
Audience
Location
Mexico (Mexico City) | 1 |
Netherlands | 1 |
Laws, Policies, & Programs
Assessments and Surveys
Program for International… | 1 |
What Works Clearinghouse Rating
Yongtian Cheng; K. V. Petrides – Educational and Psychological Measurement, 2025
Psychologists are emphasizing the importance of predictive conclusions. Machine learning methods, such as supervised neural networks, have been used in psychological studies as they naturally fit prediction tasks. However, we are concerned about whether neural networks fitted with random datasets (i.e., datasets where there is no relationship…
Descriptors: Psychological Studies, Artificial Intelligence, Cognitive Processes, Predictive Validity
Foster, Robert C. – Educational and Psychological Measurement, 2021
This article presents some equivalent forms of the common Kuder-Richardson Formula 21 and 20 estimators for nondichotomous data belonging to certain other exponential families, such as Poisson count data, exponential data, or geometric counts of trials until failure. Using the generalized framework of Foster (2020), an equation for the reliability…
Descriptors: Test Reliability, Data, Computation, Mathematical Formulas
Raykov, Tenko; Marcoulides, George A. – Educational and Psychological Measurement, 2021
The population discrepancy between unstandardized and standardized reliability of homogeneous multicomponent measuring instruments is examined. Within a latent variable modeling framework, it is shown that the standardized reliability coefficient for unidimensional scales can be markedly higher than the corresponding unstandardized reliability…
Descriptors: Test Reliability, Computation, Measures (Individuals), Research Problems
Hung-Yu Huang – Educational and Psychological Measurement, 2025
The use of discrete categorical formats to assess psychological traits has a long-standing tradition that is deeply embedded in item response theory models. The increasing prevalence and endorsement of computer- or web-based testing has led to greater focus on continuous response formats, which offer numerous advantages in both respondent…
Descriptors: Response Style (Tests), Psychological Characteristics, Item Response Theory, Test Reliability
Joseph A. Rios; Jiayi Deng – Educational and Psychological Measurement, 2025
To mitigate the potential damaging consequences of rapid guessing (RG), a form of noneffortful responding, researchers have proposed a number of scoring approaches. The present simulation study examines the robustness of the most popular of these approaches, the unidimensional effort-moderated (EM) scoring procedure, to multidimensional RG (i.e.,…
Descriptors: Scoring, Guessing (Tests), Reaction Time, Item Response Theory
Kroc, Edward; Olvera Astivia, Oscar L. – Educational and Psychological Measurement, 2022
Setting cutoff scores is one of the most common practices when using scales to aid in classification purposes. This process is usually done univariately where each optimal cutoff value is decided sequentially, subscale by subscale. While it is widely known that this process necessarily reduces the probability of "passing" such a test,…
Descriptors: Multivariate Analysis, Cutting Scores, Classification, Measurement
Viola Merhof; Caroline M. Böhm; Thorsten Meiser – Educational and Psychological Measurement, 2024
Item response tree (IRTree) models are a flexible framework to control self-reported trait measurements for response styles. To this end, IRTree models decompose the responses to rating items into sub-decisions, which are assumed to be made on the basis of either the trait being measured or a response style, whereby the effects of such person…
Descriptors: Item Response Theory, Test Interpretation, Test Reliability, Test Validity
Brennan, Robert L.; Kim, Stella Y.; Lee, Won-Chan – Educational and Psychological Measurement, 2022
This article extends multivariate generalizability theory (MGT) to tests with different random-effects designs for each level of a fixed facet. There are numerous situations in which the design of a test and the resulting data structure are not definable by a single design. One example is mixed-format tests that are composed of multiple-choice and…
Descriptors: Multivariate Analysis, Generalizability Theory, Multiple Choice Tests, Test Construction
Jiang, Zhehan; Shi, Dexin; Distefano, Christine – Educational and Psychological Measurement, 2021
The costs of an objective structured clinical examination (OSCE) are of concern to health profession educators globally. As OSCEs are usually designed under generalizability theory (G-theory) framework, this article proposes a machine-learning-based approach to optimize the costs, while maintaining the minimum required generalizability…
Descriptors: Artificial Intelligence, Generalizability Theory, Objective Tests, Foreign Countries
Schulte, Niklas; Holling, Heinz; Bürkner, Paul-Christian – Educational and Psychological Measurement, 2021
Forced-choice questionnaires can prevent faking and other response biases typically associated with rating scales. However, the derived trait scores are often unreliable and ipsative, making interindividual comparisons in high-stakes situations impossible. Several studies suggest that these problems vanish if the number of measured traits is high.…
Descriptors: Questionnaires, Measurement Techniques, Test Format, Scoring
Wyse, Adam E. – Educational and Psychological Measurement, 2021
An essential question when computing test--retest and alternate forms reliability coefficients is how many days there should be between tests. This article uses data from reading and math computerized adaptive tests to explore how the number of days between tests impacts alternate forms reliability coefficients. Results suggest that the highest…
Descriptors: Computer Assisted Testing, Adaptive Testing, Test Reliability, Reading Tests
Xiao, Leifeng; Hau, Kit-Tai – Educational and Psychological Measurement, 2023
We examined the performance of coefficient alpha and its potential competitors (ordinal alpha, omega total, Revelle's omega total [omega RT], omega hierarchical [omega h], greatest lower bound [GLB], and coefficient "H") with continuous and discrete data having different types of non-normality. Results showed the estimation bias was…
Descriptors: Statistical Bias, Statistical Analysis, Likert Scales, Statistical Distributions
Ellis, Jules L. – Educational and Psychological Measurement, 2021
This study develops a theoretical model for the costs of an exam as a function of its duration. Two kind of costs are distinguished: (1) the costs of measurement errors and (2) the costs of the measurement. Both costs are expressed in time of the student. Based on a classical test theory model, enriched with assumptions on the context, the costs…
Descriptors: Test Length, Models, Error of Measurement, Measurement
Liu, Xiaowen; Jane Rogers, H. – Educational and Psychological Measurement, 2022
Test fairness is critical to the validity of group comparisons involving gender, ethnicities, culture, or treatment conditions. Detection of differential item functioning (DIF) is one component of efforts to ensure test fairness. The current study compared four treatments for items that have been identified as showing DIF: deleting, ignoring,…
Descriptors: Item Analysis, Comparative Analysis, Culture Fair Tests, Test Validity
Stoevenbelt, Andrea H.; Wicherts, Jelte M.; Flore, Paulette C.; Phillips, Lorraine A. T.; Pietschnig, Jakob; Verschuere, Bruno; Voracek, Martin; Schwabe, Inga – Educational and Psychological Measurement, 2023
When cognitive and educational tests are administered under time limits, tests may become speeded and this may affect the reliability and validity of the resulting test scores. Prior research has shown that time limits may create or enlarge gender gaps in cognitive and academic testing. On average, women complete fewer items than men when a test…
Descriptors: Timed Tests, Gender Differences, Item Response Theory, Correlation