Publication Date
In 2025 | 6 |
Since 2024 | 13 |
Since 2021 (last 5 years) | 42 |
Since 2016 (last 10 years) | 99 |
Since 2006 (last 20 years) | 143 |
Descriptor
Scores | 225 |
Test Construction | 225 |
Test Reliability | 225 |
Test Validity | 147 |
Test Items | 60 |
Psychometrics | 52 |
Foreign Countries | 50 |
Factor Analysis | 45 |
Correlation | 34 |
Item Analysis | 30 |
Statistical Analysis | 29 |
More ▼ |
Source
Author
Publication Type
Education Level
Location
Turkey | 14 |
Hong Kong | 4 |
United Kingdom | 3 |
California | 2 |
Canada | 2 |
Germany | 2 |
Iran | 2 |
Israel | 2 |
Spain | 2 |
United Kingdom (England) | 2 |
Alabama | 1 |
More ▼ |
Laws, Policies, & Programs
Every Student Succeeds Act… | 1 |
Assessments and Surveys
What Works Clearinghouse Rating
Meets WWC Standards without Reservations | 1 |
Meets WWC Standards with or without Reservations | 1 |
Ehri Ryu – Society for Research on Educational Effectiveness, 2024
Background/Context: Confirmatory factor analysis (CFA) model is a commonly adopted framework to estimate and test a measurement model. Once a well-fitting final CFA model is selected, the selected model may be used to test structural relationships of the latent constructs with other variables, to construct a test with desired reliability and…
Descriptors: Research Problems, Factor Analysis, Scores, Computation
Mustafa Ilhan; Nese Güler; Gülsen Tasdelen Teker; Ömer Ergenekon – International Journal of Assessment Tools in Education, 2024
This study aimed to examine the effects of reverse items created with different strategies on psychometric properties and respondents' scale scores. To this end, three versions of a 10-item scale in the research were developed: 10 positive items were integrated in the first form (Form-P) and five positive and five reverse items in the other two…
Descriptors: Test Items, Psychometrics, Scores, Measures (Individuals)
Kelly Francis – ProQuest LLC, 2024
The current study involved the development, scaling, and validation of a new, brief, strength-based measure of children's ecological support, as rated by their parents. The scaling and validation of the Child and Youth Ecological Assets Scale--Parent Form (C-YEAS-P) took place through two studies. The first study involved 500 parents who were…
Descriptors: Measures (Individuals), Test Construction, Ability, Children
Kamau Oginga Siwatu; Kara Page; Narges Hadi – College Teaching, 2024
The purpose of this article is to document the development of a new measure of teaching self-efficacy -- "The College Teaching Self-Efficacy (CTSE) Scale." We designed the CTSE scale to examine individuals' beliefs in their abilities to perform specific teaching tasks in a college classroom successfully. We developed an instrument that…
Descriptors: Self Efficacy, Beliefs, Psychometrics, Measures (Individuals)
Çevik, Özlem – Pegem Journal of Education and Instruction, 2022
In the study conducted for psychometric analysis of kindness scales, teachers working in the central districts of Van, Ipekyolu, Tusba and Edremit constituted the universe of the study. The study group of the research consisted of 395 teachers, who were chosen by random sampling method from the universe of this study and who participated in the…
Descriptors: Foreign Countries, Attitude Measures, Psychometrics, Test Construction
Anders Holm; Anders Hjorth-Trolle; Robert Andersen – Sociological Methods & Research, 2025
Lagged dependent variables (LDVs) are often used as predictors in ordinary least squares (OLS) models in the social sciences. Although several estimators are commonly employed, little is known about their relative merits in the presence of classical measurement error and different longitudinal processes. We assess the performance of four commonly…
Descriptors: Elementary Education, Scores, Error of Measurement, Predictor Variables
Thompson, Kathryn N. – ProQuest LLC, 2023
It is imperative to collect validity evidence prior to interpreting and using test scores. During the process of collecting validity evidence, test developers should consider whether test scores are contaminated by sources of extraneous information. This is referred to as construct irrelevant variance, or the "degree to which test scores are…
Descriptors: Test Wiseness, Test Items, Item Response Theory, Scores
Marta Godoy-Giménez; Ángel García-Pérez; Fernando Cañadas; Angeles F. Estévez; Pablo Sayans-Jiménez – Autism: The International Journal of Research and Practice, 2024
The broad autism phenotype is the phenotypic expression of the primary characteristics of autism. However, currently available tests do not agree with the two-domain operationalization of broad autism phenotype or autism, and their internal structure has shown instability across applications. This study presents the Broad Autism…
Descriptors: Autism Spectrum Disorders, Genetics, Diagnostic Tests, Foreign Countries
Acikgul, Kubra; Sad, Suleyman Nihat; Altay, Bilal – International Journal of Assessment Tools in Education, 2023
This study aimed to develop a useful test to measure university students' spatial abilities validly and reliably. Following a sequential explanatory mixed methods research design, first, qualitative methods were used to develop the trial items for the test; next, the psychometric properties of the test were analyzed through quantitative methods…
Descriptors: Spatial Ability, Scores, Multiple Choice Tests, Test Validity
Anderson, Darcie L.; Hooks, Tisha – Journal of College Student Retention: Research, Theory & Practice, 2022
With limited budgets and increasing enrollment demands, colleges need fast, free, and practical solutions supporting academic success and retention. The Academic Reality Check (ARC) tool helps to predict traditional freshmen awareness of their own academic expectations in college quickly, supporting the financial investment being made by all…
Descriptors: College Freshmen, Expectation, Predictor Variables, Academic Achievement
Shun-Fu Hu; Amery D. Wu; Jake Stone – Journal of Educational Measurement, 2025
Scoring high-dimensional assessments (e.g., > 15 traits) can be a challenging task. This paper introduces the multilabel neural network (MNN) as a scoring method for high-dimensional assessments. Additionally, it demonstrates how MNN can score the same test responses to maximize different performance metrics, such as accuracy, recall, or…
Descriptors: Tests, Testing, Scores, Test Construction
Brennan, Robert L.; Kim, Stella Y.; Lee, Won-Chan – Educational and Psychological Measurement, 2022
This article extends multivariate generalizability theory (MGT) to tests with different random-effects designs for each level of a fixed facet. There are numerous situations in which the design of a test and the resulting data structure are not definable by a single design. One example is mixed-format tests that are composed of multiple-choice and…
Descriptors: Multivariate Analysis, Generalizability Theory, Multiple Choice Tests, Test Construction
Knoch, Ute; Deygers, Bart; Khamboonruang, Apichat – Language Testing, 2021
Rating scale development in the field of language assessment is often considered in dichotomous ways: It is assumed to be guided either by expert intuition or by drawing on performance data. Even though quite a few authors have argued that rating scale development is rarely so easily classifiable, this dyadic view has dominated language testing…
Descriptors: Rating Scales, Test Construction, Language Tests, Test Use
Abdullah Alamer; Ahmed Al Khateeb; Abdulrahman Alshabeb – Language Assessment Quarterly, 2025
This study introduces the first Arabic Vocabulary Levels Test (Arabic-VLT), created for foreign learners of Arabic. We present compelling evidence to substantiate its validity and reliability. The Arabic-VLT was developed according to five levels, beginning with the most frequently used words (Level 1) to the least frequently used ones (Level 5),…
Descriptors: Arabic, Vocabulary Development, Test Construction, Second Language Learning
Farshad Effatpanah; Purya Baghaei; Mona Tabatabaee-Yazdi; Esmat Babaii – Language Testing, 2025
This study aimed to propose a new method for scoring C-Tests as measures of general language proficiency. In this approach, the unit of analysis is sentences rather than gaps or passages. That is, the gaps correctly reformulated in each sentence were aggregated as sentence score, and then each sentence was entered into the analysis as a polytomous…
Descriptors: Item Response Theory, Language Tests, Test Items, Test Construction