Publication Date
In 2025 | 2 |
Since 2024 | 6 |
Since 2021 (last 5 years) | 7 |
Since 2016 (last 10 years) | 24 |
Since 2006 (last 20 years) | 48 |
Descriptor
Evaluation Methods | 99 |
Test Reliability | 99 |
Models | 73 |
Test Validity | 65 |
Statistical Analysis | 23 |
Test Construction | 22 |
Measurement Techniques | 20 |
Student Evaluation | 19 |
Foreign Countries | 18 |
Higher Education | 14 |
Academic Achievement | 12 |
More ▼ |
Source
Author
Amrein-Beardsley, Audrey | 2 |
Cason, Gerald J. | 2 |
A. Suparmi | 1 |
Ackerman, Debra J. | 1 |
Aimee Howley | 1 |
Algina, James | 1 |
Antoniou, Panayiotis | 1 |
Arreola, Raoul A. | 1 |
Aydin, Selami | 1 |
Bachor, Dan G. | 1 |
Bang Quan Zheng | 1 |
More ▼ |
Publication Type
Education Level
Location
China | 3 |
Japan | 3 |
United Kingdom | 3 |
Brazil | 2 |
California | 2 |
Florida | 2 |
Germany | 2 |
Ohio | 2 |
Pennsylvania | 2 |
Russia | 2 |
Spain | 2 |
More ▼ |
Laws, Policies, & Programs
Elementary and Secondary… | 1 |
Assessments and Surveys
Georgia Criterion Referenced… | 1 |
Hidden Figures Test | 1 |
Motivated Strategies for… | 1 |
NEO Personality Inventory | 1 |
National Assessment of… | 1 |
What Works Clearinghouse Rating
Ke-Hai Yuan; Zhiyong Zhang; Lijuan Wang – Grantee Submission, 2024
Mediation analysis plays an important role in understanding causal processes in social and behavioral sciences. While path analysis with composite scores was criticized to yield biased parameter estimates when variables contain measurement errors, recent literature has pointed out that the population values of parameters of latent-variable models…
Descriptors: Structural Equation Models, Path Analysis, Weighted Scores, Comparative Testing
Rohit Batra; Silvia A. Bunge; Emilio Ferrer – Structural Equation Modeling: A Multidisciplinary Journal, 2022
Studying development processes, as they unfold over time, involves collecting repeated measures from individuals and modeling the changes over time. One methodological challenge in this type of longitudinal data is separating retest effects, due to the repeated assessments, from developmental processes such as maturation or age. In this article,…
Descriptors: Children, Adolescents, Longitudinal Studies, Test Reliability
Tenko Raykov; Bingsheng Zhang – Structural Equation Modeling: A Multidisciplinary Journal, 2024
Multidimensional measuring instruments are often used in behavioral, social, educational, marketing, and biomedical research. For these scales, the paper discusses how to find the optimal score based on their components that is associated with the highest possible reliability. Within the framework of structural equation modeling, an approach to…
Descriptors: Multidimensional Scaling, Measurement Equipment, Measurement Techniques, Test Reliability
Madeline A. Schellman; Matthew J. Madison – Grantee Submission, 2024
Diagnostic classification models (DCMs) have grown in popularity as stakeholders increasingly desire actionable information related to students' skill competencies. Longitudinal DCMs offer a psychometric framework for providing estimates of students' proficiency status transitions over time. For both cross-sectional and longitudinal DCMs, it is…
Descriptors: Diagnostic Tests, Classification, Models, Psychometrics
Bang Quan Zheng; Peter M. Bentler – Structural Equation Modeling: A Multidisciplinary Journal, 2025
This paper aims to advocate for a balanced approach to model fit evaluation in structural equation modeling (SEM). The ongoing debate surrounding chi-square test statistics and fit indices has been characterized by ambiguity and controversy. Despite the acknowledged limitations of relying solely on the chi-square test, its careful application can…
Descriptors: Monte Carlo Methods, Structural Equation Models, Goodness of Fit, Robustness (Statistics)
Sujiyani Kassiavera; A. Suparmi; C. Cari; Sukarmin Sukarmin – Journal of Baltic Science Education, 2024
The challenge of accurately assessing critical thinking in physics education, particularly on topics like work and energy, remains a key issue for educators. The current study aims to address this challenge by exploring students' critical thinking abilities using two-tier test data analyzed through the Rasch model. Data were collected from…
Descriptors: Critical Thinking, Physics, Science Instruction, Foreign Countries
Aimee Howley; Craig B. Howley; Marged Dudek – Journal of Educational Leadership and Policy Studies, 2025
This article explores the development and evaluation of the Building Leadership Team Assessment Tool (BLT-AT), designed to measure Professional Learning Communities' (PLCs') use of effective school improvement practices. The BLT-AT is grounded in Ohio's inclusive instructional leadership model, which emphasizes the improvement of teaching and…
Descriptors: Test Construction, Communities of Practice, Instructional Leadership, Evaluation Methods
Zheng, Boyang; Sun, Guiping; Wang, Hourong – SAGE Open, 2019
Traditional Chinese medicine (TCM) is an important component of China's medical system. How to educate TCM practitioners in China, therefore, has become a crucial issue. To contribute to this issue, the current research identified the competency model of TCM practitioners in China and developed an evaluation for TCM students. We combined Bloom's…
Descriptors: Medical Students, Correlation, Foreign Countries, Test Reliability
Wang, Xiaolin; Svetina, Dubravka; Dai, Shenghai – Journal of Experimental Education, 2019
Recently, interest in test subscore reporting for diagnosis purposes has been growing rapidly. The two simulation studies here examined factors (sample size, number of subscales, correlation between subscales, and three factors affecting subscore reliability: number of items per subscale, item parameter distribution, and data generating model)…
Descriptors: Value Added Models, Scores, Sample Size, Correlation
Martínez, José Felipe; Schweig, Jonathan; Goldschmidt, Pete – Educational Evaluation and Policy Analysis, 2016
A key question facing teacher evaluation systems is how to combine multiple measures of complex constructs into composite indicators of performance. We use data from the Measures of Effective Teaching (MET) study to investigate the measurement properties of composite indicators obtained under various conjunctive, disjunctive (or complementary),…
Descriptors: Teacher Evaluation, Outcome Measures, Evaluation Methods, Educational Policy
Amrein-Beardsley, Audrey; Geiger, Tray – Phi Delta Kappan, 2017
Houston's experience with the Educational Value-Added Assessment System (R) (EVAAS) raises questions that other districts should consider before buying the software and using it for high-stakes decisions. Researchers found that teachers in Houston, all of whom were under the EVAAS gun, but who taught relatively more racial minority students,…
Descriptors: Value Added Models, School Districts, Computer Software, Educational Technology
Antoniou, Panayiotis; Lu, Mohan – Educational Management Administration & Leadership, 2018
During the last 25 years researchers have proposed a number of conceptual frameworks to measure the various functions of instructional leadership. One of the most frequently used frameworks is the Principal Instructional Management Rating Scale (PIMRS). Despite the great number of studies employing the PIMRS, evidence for its reliability and…
Descriptors: Rating Scales, Instructional Leadership, Evaluation Methods, Educational Administration
Ackerman, Debra J. – ETS Research Report Series, 2020
Over the past 8 years, U.S. kindergarten classrooms have been impacted by policies mandating or recommending the administration of a specific kindergarten entry assessment (KEA) in the initial months of school as well as the increasing reliance on digital technology in the form of mobile apps, touchscreen devices, and online data platforms. Using…
Descriptors: Kindergarten, School Readiness, Computer Assisted Testing, Preschool Teachers
Aydin, Selami; Harputlu, Leyla; Çelik, Seyda Savran; Ustuk, Özgehan; Güzel, Serhat; Genç, Deniz – Online Submission, 2016
Measurement of children's behaviors in an educational and research context is a problematic and complex area. It is also evident that adapting scales to measure children's behaviors in an educational and research context is a complex process due to several reasons. First, cultural elements constitute a considerable problem. Second, it is difficult…
Descriptors: Child Behavior, Models, Test Construction, Test Validity
Berliner, David C. – Education Policy Analysis Archives, 2018
The Scylla and Charybdis in this discussion of teacher evaluation are standardized achievement test data on the one hand, and classroom observational systems on the other. These are the two most common methods used to judge teachers' competency. Both have serious flaws: the former primarily with validity, the latter primarily with reliability. At…
Descriptors: Teacher Evaluation, Evaluation Problems, Standardized Tests, Achievement Tests