Publication Date
| In 2026 | 0 |
| Since 2025 | 12 |
| Since 2022 (last 5 years) | 114 |
| Since 2017 (last 10 years) | 375 |
| Since 2007 (last 20 years) | 1130 |
Descriptor
| Comparative Analysis | 1943 |
| Reliability | 880 |
| Test Reliability | 792 |
| Foreign Countries | 554 |
| Test Validity | 443 |
| Correlation | 350 |
| Validity | 332 |
| Interrater Reliability | 327 |
| Statistical Analysis | 321 |
| Scores | 280 |
| Measures (Individuals) | 236 |
| More ▼ | |
Source
Author
| Reckase, Mark D. | 6 |
| Attali, Yigal | 5 |
| Coniam, David | 5 |
| Brennan, Robert L. | 4 |
| Crehan, Kevin D. | 4 |
| Feldt, Leonard S. | 4 |
| Hakstian, A. Ralph | 4 |
| Jones, Ian | 4 |
| Kolen, Michael J. | 4 |
| Lunz, Mary E. | 4 |
| August, Diane | 3 |
| More ▼ | |
Publication Type
Education Level
Audience
| Researchers | 35 |
| Practitioners | 29 |
| Teachers | 15 |
| Administrators | 9 |
| Policymakers | 6 |
| Counselors | 2 |
| Media Staff | 2 |
| Parents | 1 |
| Support Staff | 1 |
Location
| Turkey | 59 |
| United States | 47 |
| Australia | 36 |
| China | 33 |
| Canada | 32 |
| United Kingdom (England) | 32 |
| United Kingdom | 28 |
| Germany | 25 |
| Netherlands | 24 |
| Taiwan | 22 |
| Hong Kong | 20 |
| More ▼ | |
Laws, Policies, & Programs
Assessments and Surveys
What Works Clearinghouse Rating
| Meets WWC Standards with or without Reservations | 1 |
| Does not meet standards | 1 |
Kate E. Walton; Cristina Anguiano-Carrasco – ACT, Inc., 2024
Large language models (LLMs), such as ChatGPT, are becoming increasingly prominent. Their use is becoming more and more popular to assist with simple tasks, such as summarizing documents, translating languages, rephrasing sentences, or answering questions. Reports like McKinsey's (Chui, & Yee, 2023) estimate that by implementing LLMs,…
Descriptors: Artificial Intelligence, Man Machine Systems, Natural Language Processing, Test Construction
Karel Kok; Sophia Chroszczinsky; Burkhard Priemer – Physical Review Physics Education Research, 2024
Data comparison problems are used in teaching and science education research that focuses on students' ability to compare datasets and their conceptual understanding of measurement uncertainties. However, the evaluation of students' decisions in these problems can pose a problem: e.g., students making a correct decision for the wrong reasons.…
Descriptors: Secondary School Students, Undergraduate Students, Comparative Analysis, Evaluation Methods
Maïano, Christophe; Morin, Alexandre J. S.; Tietjens, Maike; Bastos, Tânia; Luiggi, Maxime; Corredeira, Rui; Griffet, Jean; Sánchez-Oliva, David – Measurement in Physical Education and Exercise Science, 2023
The present study sought to examine the psychometric properties of new German, Portuguese, and Spanish versions of the Revised Short Form of the Physical Self-Inventory (PSI-S-"R"), and to contrast these properties against those from the original French version of this instrument. Participants (n = 1802) were 288 French youth, 177 German…
Descriptors: German, Portuguese, Spanish, Test Construction
Jönsson, Anders; Balan, Andreia – Practical Assessment, Research & Evaluation, 2018
Research on teachers' grading has shown that there is great variability among teachers regarding both the process and product of grading, resulting in low comparability and issues of inequality when using grades for selection purposes. Despite this situation, not much is known about the merits or disadvantages of different models for grading. In…
Descriptors: Grading, Models, Reliability, Validity
Igor Esnaola; Albert Sesé; Lorea Azpiazu; Yina Wang – British Journal of Educational Psychology, 2024
Background: Modelling academic self-concept through second-order factors or bifactor structures is an important issue with substantive and practical implications; besides, the bifactor model has not been analysed with a Chinese sample and cross-cultural studies in the academic self-concept are scarce. Likewise, latent structure validity evidence…
Descriptors: Academic Achievement, Self Concept, Psychometrics, Validity
Jordan M. Wheeler; Allan S. Cohen; Shiyu Wang – Journal of Educational and Behavioral Statistics, 2024
Topic models are mathematical and statistical models used to analyze textual data. The objective of topic models is to gain information about the latent semantic space of a set of related textual data. The semantic space of a set of textual data contains the relationship between documents and words and how they are used. Topic models are becoming…
Descriptors: Semantics, Educational Assessment, Evaluators, Reliability
Andrew R. Thompson – Advances in Physiology Education, 2024
The revised two-factor Study Process Questionnaire and the Approaches and Study Skills Inventory for Students are two instruments commonly used to measure student learning approach. Although they are designed to measure similar constructs, it is unclear whether the metrics they provide differ in terms of their real-world classification of learning…
Descriptors: Comparative Analysis, Anatomy, Classification, Cognitive Style
Jeff Allen; Ty Cruce – ACT Education Corp., 2025
This report summarizes some of the evidence supporting interpretations of scores from the enhanced ACT, focusing on reliability, concurrent validity, predictive validity, and score comparability. The authors argue that the evidence presented in this report supports the interpretation of scores from the enhanced ACT as measures of high school…
Descriptors: College Entrance Examinations, Testing, Change, Scores
Fu, Yuanshu; Wen, Zhonglin; Wang, Yang – Educational and Psychological Measurement, 2022
Composite reliability, or coefficient omega, can be estimated using structural equation modeling. Composite reliability is usually estimated under the basic independent clusters model of confirmatory factor analysis (ICM-CFA). However, due to the existence of cross-loadings, the model fit of the exploratory structural equation model (ESEM) is…
Descriptors: Comparative Analysis, Structural Equation Models, Factor Analysis, Reliability
Reza Shahi; Hamdollah Ravand; Golam Reza Rohani – International Journal of Language Testing, 2025
The current paper intends to exploit the Many Facet Rasch Model to investigate and compare the impact of situations (items) and raters on test takers' performance on the Written Discourse Completion Test (WDCT) and Discourse Self-Assessment Tests (DSAT). In this study, the participants were 110 English as a Foreign Language (EFL) students at…
Descriptors: Comparative Analysis, English (Second Language), Second Language Learning, Second Language Instruction
Yubin Xu; Lin Liu; Jianwen Xiong; Guangtian Zhu – Journal of Baltic Science Education, 2025
As the development and application of large language models (LLMs) in physics education progress, the well-known AI-based chatbot ChatGPT4 has presented numerous opportunities for educational assessment. Investigating the potential of AI tools in practical educational assessment carries profound significance. This study explored the comparative…
Descriptors: Physics, Artificial Intelligence, Computer Software, Accuracy
Brian Weiler; Ling-Yu Guo – Language, Speech, and Hearing Services in Schools, 2024
Purpose: The finite verb morphology composite (FVMC) is a valid measure for charting children's tense development and for differentiating children with and without language impairment during preschool and early elementary years. However, it is unclear whether FVMC scores vary as a function of language sample elicitation contexts. The current study…
Descriptors: Verbs, Preschool Children, Morphology (Languages), Accuracy
Grajzel, Katalin; Dumas, Denis; Acar, Selcuk – Journal of Creative Behavior, 2022
One of the best-known and most frequently used measures of creative idea generation is the Torrance Test of Creative Thinking (TTCT). The TTCT Verbal, assessing verbal ideation, contains two forms created to be used interchangeably by researchers and practitioners. However, the parallel forms reliability of the two versions of the TTCT Verbal has…
Descriptors: Test Reliability, Creative Thinking, Creativity Tests, Verbal Ability
Lind, Veronika; Svensson, Melanie; Harringe, Marita L. – Measurement in Physical Education and Exercise Science, 2022
Goniometry is commonly used to evaluate joint range of motion (ROM). The most widespread method, a manual universal goniometer (UG), is considered time-consuming and difficult to handle. The digital goniometer EasyAngle (EA) was developed to improve and simplify the evaluation of ROM. This study aimed to evaluate the reliability and validity of EA…
Descriptors: Motor Reactions, Measurement Techniques, Comparative Analysis, Measurement Equipment
Crompvoets, Elise A. V.; Béguin, Anton A.; Sijtsma, Klaas – Journal of Educational and Behavioral Statistics, 2020
Pairwise comparison is becoming increasingly popular as a holistic measurement method in education. Unfortunately, many comparisons are required for reliable measurement. To reduce the number of required comparisons, we developed an adaptive selection algorithm (ASA) that selects the most informative comparisons while taking the uncertainty of the…
Descriptors: Comparative Analysis, Statistical Analysis, Mathematics, Measurement

Peer reviewed
Direct link
