Publication Date
In 2025 | 1 |
Since 2024 | 4 |
Since 2021 (last 5 years) | 22 |
Since 2016 (last 10 years) | 88 |
Since 2006 (last 20 years) | 158 |
Descriptor
Correlation | 188 |
Test Items | 188 |
Test Reliability | 120 |
Test Validity | 71 |
Foreign Countries | 66 |
Scores | 57 |
Factor Analysis | 55 |
Reliability | 55 |
Test Construction | 52 |
Statistical Analysis | 47 |
Psychometrics | 46 |
More ▼ |
Source
Author
Liu, Ou Lydia | 5 |
Farina, Kristy | 3 |
LaVenia, Mark | 3 |
Schoen, Robert C. | 3 |
Attali, Yigal | 2 |
Champagne, Zachary M. | 2 |
Dikmenli, Yurdal | 2 |
Hung Tan Ha | 2 |
Mao, Liyang | 2 |
Metsämuuronen, Jari | 2 |
Sijtsma, Klaas | 2 |
More ▼ |
Publication Type
Education Level
Audience
Researchers | 3 |
Practitioners | 1 |
Teachers | 1 |
Location
Turkey | 20 |
California | 4 |
Canada | 4 |
Germany | 4 |
New York | 4 |
China | 3 |
Florida | 3 |
India | 3 |
Australia | 2 |
Illinois | 2 |
Japan | 2 |
More ▼ |
Laws, Policies, & Programs
United Nations Convention on… | 1 |
Assessments and Surveys
What Works Clearinghouse Rating
Meets WWC Standards without Reservations | 1 |
Meets WWC Standards with or without Reservations | 1 |
Hung Tan Ha; Duyen Thi Bich Nguyen; Tim Stoeckel – Language Assessment Quarterly, 2025
This article compares two methods for detecting local item dependence (LID): residual correlation examination and Rasch testlet modeling (RTM), in a commonly used 3:6 matching format and an extended matching test (EMT) format. The two formats are hypothesized to facilitate different levels of item dependency due to differences in the number of…
Descriptors: Comparative Analysis, Language Tests, Test Items, Item Analysis
Metsämuuronen, Jari – Practical Assessment, Research & Evaluation, 2022
The reliability of a test score is usually underestimated and the deflation may be profound, 0.40 - 0.60 units of reliability or 46 - 71%. Eight root sources of the deflation are discussed and quantified by a simulation with 1,440 real-world datasets: (1) errors in the measurement modelling, (2) inefficiency in the estimator of reliability within…
Descriptors: Test Reliability, Scores, Test Items, Correlation
Novak, Josip; Rebernjak, Blaž – Measurement: Interdisciplinary Research and Perspectives, 2023
A Monte Carlo simulation study was conducted to examine the performance of [alpha], [lambda]2, [lambda][subscript 4], [lambda][subscript 2], [omega][subscript T], GLB[subscript MRFA], and GLB[subscript Algebraic] coefficients. Population reliability, distribution shape, sample size, test length, and number of response categories were varied…
Descriptors: Monte Carlo Methods, Evaluation Methods, Reliability, Simulation
Kilic, Abdullah Faruk; Uysal, Ibrahim – International Journal of Assessment Tools in Education, 2022
Most researchers investigate the corrected item-total correlation of items when analyzing item discrimination in multi-dimensional structures under the Classical Test Theory, which might lead to underestimating item discrimination, thereby removing items from the test. Researchers might investigate the corrected item-total correlation with the…
Descriptors: Item Analysis, Correlation, Item Response Theory, Test Items
Gu, Zhengguo; Emons, Wilco H. M.; Sijtsma, Klaas – Journal of Educational and Behavioral Statistics, 2021
Clinical, medical, and health psychologists use difference scores obtained from pretest--posttest designs employing the same test to assess intraindividual change possibly caused by an intervention addressing, for example, anxiety, depression, eating disorder, or addiction. Reliability of difference scores is important for interpreting observed…
Descriptors: Test Reliability, Scores, Pretests Posttests, Computation
Jordan M. Wheeler; Allan S. Cohen; Shiyu Wang – Journal of Educational and Behavioral Statistics, 2024
Topic models are mathematical and statistical models used to analyze textual data. The objective of topic models is to gain information about the latent semantic space of a set of related textual data. The semantic space of a set of textual data contains the relationship between documents and words and how they are used. Topic models are becoming…
Descriptors: Semantics, Educational Assessment, Evaluators, Reliability
Olvera Astivia, Oscar Lorenzo; Kroc, Edward; Zumbo, Bruno D. – Educational and Psychological Measurement, 2020
Simulations concerning the distributional assumptions of coefficient alpha are contradictory. To provide a more principled theoretical framework, this article relies on the Fréchet-Hoeffding bounds, in order to showcase that the distribution of the items play a role on the estimation of correlations and covariances. More specifically, these bounds…
Descriptors: Test Items, Test Reliability, Computation, Correlation
The Reliability of the Posterior Probability of Skill Attainment in Diagnostic Classification Models
Johnson, Matthew S.; Sinharay, Sandip – Journal of Educational and Behavioral Statistics, 2020
One common score reported from diagnostic classification assessments is the vector of posterior means of the skill mastery indicators. As with any assessment, it is important to derive and report estimates of the reliability of the reported scores. After reviewing a reliability measure suggested by Templin and Bradshaw, this article suggests three…
Descriptors: Reliability, Probability, Skill Development, Classification
Fatih Orcan – International Journal of Assessment Tools in Education, 2023
Among all, Cronbach's Alpha and McDonald's Omega are commonly used for reliability estimations. The alpha uses inter-item correlations while omega is based on a factor analysis result. This study uses simulated ordinal data sets to test whether the alpha and omega produce different estimates. Their performances were compared according to the…
Descriptors: Statistical Analysis, Monte Carlo Methods, Correlation, Factor Analysis
Vucaj, Indrit – Journal of Research on Technology in Education, 2022
This study presents the methodological and procedural development process of the Digital Age Teaching Scale (DATS), a summative assessment tool designed to measure application of the ISTE Standards for Educators in K-12 classrooms. The theoretical framework of the ISTE Standards for Educators informed the development of DATS, and an 8-step process…
Descriptors: Elementary Secondary Education, Standards, Test Construction, Test Items
Barno S. Abdullaeva; Diyorjon Abdullaev; Nurislom I. Khursanov; Khurshida B. Kadirova; Laylo Djuraeva – International Journal of Language Testing, 2024
Cloze tests are commonly used in language testing as a quick measure of overall language ability or reading comprehension. A problem for the analysis of cloze tests with item response theory models is that cloze test items are locally dependent. This leads to the violation of the conditional or local independence assumption of IRT models. In this…
Descriptors: Cloze Procedure, Language Tests, Test Items, Correlation
Metsämuuronen, Jari – International Journal of Educational Methodology, 2020
Pearson product-moment correlation coefficient between item g and test score X, known as item-test or item-total correlation ("Rit"), and item-rest correlation ("Rir") are two of the most used classical estimators for item discrimination power (IDP). Both "Rit" and "Rir" underestimate IDP caused by the…
Descriptors: Correlation, Test Items, Scores, Difficulty Level
Gill, Tim – Research Matters, 2022
In Comparative Judgement (CJ) exercises, examiners are asked to look at a selection of candidate scripts (with marks removed) and order them in terms of which they believe display the best quality. By including scripts from different examination sessions, the results of these exercises can be used to help with maintaining standards. Results from…
Descriptors: Comparative Analysis, Decision Making, Scripts, Standards
Tim Stoeckel; Liang Ye Tan; Hung Tan Ha; Nam Thi Phuong Ho; Tomoko Ishii; Young Ae Kim; Chunmei Huang; Stuart McLean – Vocabulary Learning and Instruction, 2024
Local item dependency (LID) occurs when test-takers' responses to one test item are affected by their responses to another. It can be problematic if it causes inflated reliability estimates or distorted person and item measures. The cued-recall reading comprehension test in Hu and Nation's (2000) well-known and influential coverage--comprehension…
Descriptors: Reading Comprehension, English (Second Language), Second Language Instruction, Second Language Learning
Ferrari-Bridgers, Franca – International Journal of Listening, 2023
While many tools exist to assess student content knowledge, there are few that assess whether students display the critical listening skills necessary to interpret the quality of a speaker's message at the college level. The following research provides preliminary evidence for the internal consistency and factor structure of a tool, the…
Descriptors: Factor Structure, Test Validity, Community College Students, Test Reliability