Publication Date
In 2025 | 0 |
Since 2024 | 0 |
Since 2021 (last 5 years) | 3 |
Since 2016 (last 10 years) | 9 |
Since 2006 (last 20 years) | 21 |
Descriptor
Scores | 54 |
Test Reliability | 54 |
Test Validity | 19 |
Psychometrics | 15 |
Test Items | 13 |
Correlation | 10 |
Foreign Countries | 9 |
Item Response Theory | 9 |
Statistical Analysis | 8 |
Test Construction | 8 |
Comparative Analysis | 7 |
More ▼ |
Source
Educational and Psychological… | 54 |
Author
Publication Type
Journal Articles | 51 |
Reports - Research | 36 |
Reports - Evaluative | 12 |
Reports - Descriptive | 2 |
Opinion Papers | 1 |
Education Level
Higher Education | 5 |
Postsecondary Education | 4 |
Secondary Education | 4 |
High Schools | 3 |
Middle Schools | 2 |
Junior High Schools | 1 |
Audience
Location
Canada | 1 |
China | 1 |
Colombia | 1 |
Georgia | 1 |
Hong Kong | 1 |
Netherlands | 1 |
Norway | 1 |
Saudi Arabia | 1 |
United Kingdom (Scotland) | 1 |
Laws, Policies, & Programs
Assessments and Surveys
What Works Clearinghouse Rating
Brennan, Robert L.; Kim, Stella Y.; Lee, Won-Chan – Educational and Psychological Measurement, 2022
This article extends multivariate generalizability theory (MGT) to tests with different random-effects designs for each level of a fixed facet. There are numerous situations in which the design of a test and the resulting data structure are not definable by a single design. One example is mixed-format tests that are composed of multiple-choice and…
Descriptors: Multivariate Analysis, Generalizability Theory, Multiple Choice Tests, Test Construction
Xiao, Leifeng; Hau, Kit-Tai – Educational and Psychological Measurement, 2023
We examined the performance of coefficient alpha and its potential competitors (ordinal alpha, omega total, Revelle's omega total [omega RT], omega hierarchical [omega h], greatest lower bound [GLB], and coefficient "H") with continuous and discrete data having different types of non-normality. Results showed the estimation bias was…
Descriptors: Statistical Bias, Statistical Analysis, Likert Scales, Statistical Distributions
Raykov, Tenko; Marcoulides, George A. – Educational and Psychological Measurement, 2019
This note discusses the merits of coefficient alpha and their conditions in light of recent critical publications that miss out on significant research findings over the past several decades. That earlier research has demonstrated the empirical relevance and utility of coefficient alpha under certain empirical circumstances. The article highlights…
Descriptors: Test Validity, Test Reliability, Test Items, Correlation
Stoevenbelt, Andrea H.; Wicherts, Jelte M.; Flore, Paulette C.; Phillips, Lorraine A. T.; Pietschnig, Jakob; Verschuere, Bruno; Voracek, Martin; Schwabe, Inga – Educational and Psychological Measurement, 2023
When cognitive and educational tests are administered under time limits, tests may become speeded and this may affect the reliability and validity of the resulting test scores. Prior research has shown that time limits may create or enlarge gender gaps in cognitive and academic testing. On average, women complete fewer items than men when a test…
Descriptors: Timed Tests, Gender Differences, Item Response Theory, Correlation
Nicewander, W. Alan – Educational and Psychological Measurement, 2019
This inquiry is focused on three indicators of the precision of measurement--conditional on fixed values of ?, the latent variable of item response theory (IRT). The indicators that are compared are (1) The traditional, conditional standard errors, s(eX|?) = CSEM; (2) the IRT-based conditional standard errors, s[subscript irt](eX|?)=C[subscript…
Descriptors: Measurement, Accuracy, Scores, Error of Measurement
Zijlmans, Eva A. O.; Tijmstra, Jesper; van der Ark, L. Andries; Sijtsma, Klaas – Educational and Psychological Measurement, 2018
Reliability is usually estimated for a total score, but it can also be estimated for item scores. Item-score reliability can be useful to assess the repeatability of an individual item score in a group. Three methods to estimate item-score reliability are discussed, known as method MS, method [lambda][subscript 6], and method CA. The item-score…
Descriptors: Test Items, Test Reliability, Correlation, Comparative Analysis
Fu, Yuanshu; Wen, Zhonglin; Wang, Yang – Educational and Psychological Measurement, 2018
The maximal reliability of a congeneric measure is achieved by weighting item scores to form the optimal linear combination as the total score; it is never lower than the composite reliability of the measure when measurement errors are uncorrelated. The statistical method that renders maximal reliability would also lead to maximal criterion…
Descriptors: Test Reliability, Test Validity, Comparative Analysis, Attitude Measures
Andersson, Björn; Xin, Tao – Educational and Psychological Measurement, 2018
In applications of item response theory (IRT), an estimate of the reliability of the ability estimates or sum scores is often reported. However, analytical expressions for the standard errors of the estimators of the reliability coefficients are not available in the literature and therefore the variability associated with the estimated reliability…
Descriptors: Item Response Theory, Test Reliability, Test Items, Scores
Stanley, Leanne M.; Edwards, Michael C. – Educational and Psychological Measurement, 2016
The purpose of this article is to highlight the distinction between the reliability of test scores and the fit of psychometric measurement models, reminding readers why it is important to consider both when evaluating whether test scores are valid for a proposed interpretation and/or use. It is often the case that an investigator judges both the…
Descriptors: Test Reliability, Goodness of Fit, Scores, Patients
Dimitrov, Dimiter M.; Raykov, Tenko; AL-Qataee, Abdullah Ali – Educational and Psychological Measurement, 2015
This article is concerned with developing a measure of general academic ability (GAA) for high school graduates who apply to colleges, as well as with the identification of optimal weights of the GAA indicators in a linear combination that yields a composite score with maximal reliability and maximal predictive validity, employing the framework of…
Descriptors: Foreign Countries, Academic Ability, Aptitude Tests, High School Students
Balogh, Jennifer; Bernstein, Jared; Cheng, Jian; Van Moere, Alistair; Townshend, Brent; Suzuki, Masanori – Educational and Psychological Measurement, 2012
A two-part experiment is presented that validates a new measurement tool for scoring oral reading ability. Data collected by the U.S. government in a large-scale literacy assessment of adults were analyzed by a system called VersaReader that uses automatic speech recognition and speech processing technologies to score oral reading fluency. In the…
Descriptors: Reading Fluency, Measures (Individuals), Scoring, Reading Ability
Skorupski, William P.; Carvajal, Jorge – Educational and Psychological Measurement, 2010
This study is an evaluation of the psychometric issues associated with estimating objective level scores, often referred to as "subscores." The article begins by introducing the concepts of reliability and validity for subscores from statewide achievement tests. These issues are discussed with reference to popular scaling techniques, classical…
Descriptors: Testing Programs, Test Validity, Achievement Tests, Scores
Ordonez, Xavier G.; Ponsoda, Vicente; Abad, Francisco J.; Romero, Sonia J. – Educational and Psychological Measurement, 2009
This article proposes a new test (called the EQEBI) for the measurement of epistemological beliefs, integrating and extending the Epistemological Questionnaire (EQ) and the Epistemic Beliefs Inventory (EBI). In Study 1, the two tests were translated and applied to a Spanish-speaking sample. A detailed dimensionality exploration, by means of the…
Descriptors: Epistemology, Beliefs, Tests, Spanish Speaking
Bing, Mark N.; Stewart, Susan M.; Davison, H. Kristl – Educational and Psychological Measurement, 2009
Handheld calculators have been used on the job for more than 30 years, yet the degree to which these devices can affect performance on employment tests of mathematical ability has not been thoroughly examined. This study used a within-subjects research design (N = 167) to investigate the effects of calculator use on test score reliability, test…
Descriptors: Calculators, Mathematics Tests, Occupational Tests, Test Reliability
Bjornebekk, Gunnar – Educational and Psychological Measurement, 2009
The primary aim of this study was to examine the psychometric properties of the scores on a version for children of the Carver and White Behavioral Inhibition and Activation scales (the BIS-BAS scales). This involved administering the BIS-BAS scales, the Positive and Negative Affect Schedule, the Junior Eysenck Personality Questionnaire…
Descriptors: Measures (Individuals), Psychometrics, Grade 6, Test Validity