Publication Date
In 2025 | 2 |
Since 2024 | 3 |
Since 2021 (last 5 years) | 9 |
Since 2016 (last 10 years) | 17 |
Since 2006 (last 20 years) | 39 |
Descriptor
Evaluation Methods | 50 |
Inferences | 50 |
Validity | 27 |
Test Validity | 22 |
Scores | 12 |
Student Evaluation | 11 |
Models | 10 |
Test Construction | 10 |
Test Items | 8 |
Measurement Techniques | 7 |
Construct Validity | 6 |
More ▼ |
Source
Author
Blunk, Merrie | 2 |
Ercikan, Kadriye | 2 |
Goldschmidt, Pete | 2 |
Haertel, Geneva | 2 |
Hill, Heather C. | 2 |
Kane, Michael T. | 2 |
Abedi, Jamal | 1 |
Ahn, Soyeon | 1 |
Alexiou, Jon J. | 1 |
Alghazali, Tawfeeq | 1 |
Almond, Patricia | 1 |
More ▼ |
Publication Type
Education Level
Elementary Secondary Education | 8 |
Elementary Education | 5 |
Higher Education | 4 |
Grade 4 | 2 |
Grade 5 | 2 |
Secondary Education | 2 |
Adult Education | 1 |
Grade 6 | 1 |
Grade 7 | 1 |
High Schools | 1 |
Intermediate Grades | 1 |
More ▼ |
Audience
Practitioners | 1 |
Location
United States | 3 |
Ohio | 2 |
United Kingdom (England) | 2 |
Australia | 1 |
California | 1 |
China | 1 |
Colombia | 1 |
Cyprus | 1 |
Ireland | 1 |
Israel | 1 |
South Korea | 1 |
More ▼ |
Laws, Policies, & Programs
Assessments and Surveys
International English… | 1 |
Progress in International… | 1 |
Test of English as a Foreign… | 1 |
What Works Clearinghouse Rating
Kylie L. Anglin – Annenberg Institute for School Reform at Brown University, 2025
Since 2018, institutions of higher education have been aware of the "enrollment cliff" which refers to expected declines in future enrollment. This paper attempts to describe how prepared institutions in Ohio are for this future by looking at trends leading up to the anticipated decline. Using IPEDS data from 2012-2022, we analyze trends…
Descriptors: Validity, Artificial Intelligence, Models, Best Practices
Nan Xie; Zhengxu Li; Haipeng Lu; Wei Pang; Jiayin Song; Beier Lu – IEEE Transactions on Learning Technologies, 2025
Classroom engagement is a critical factor for evaluating students' learning outcomes and teachers' instructional strategies. Traditional methods for detecting classroom engagement, such as coding and questionnaires, are often limited by delays, subjectivity, and external interference. While some neural network models have been proposed to detect…
Descriptors: Learner Engagement, Artificial Intelligence, Technology Uses in Education, Educational Technology
Kylie Anglin – AERA Open, 2024
Given the rapid adoption of machine learning methods by education researchers, and the growing acknowledgment of their inherent risks, there is an urgent need for tailored methodological guidance on how to improve and evaluate the validity of inferences drawn from these methods. Drawing on an integrative literature review and extending a…
Descriptors: Validity, Artificial Intelligence, Models, Best Practices
Manapat, Patrick D.; Edwards, Michael C. – Educational and Psychological Measurement, 2022
When fitting unidimensional item response theory (IRT) models, the population distribution of the latent trait ([theta]) is often assumed to be normally distributed. However, some psychological theories would suggest a nonnormal [theta]. For example, some clinical traits (e.g., alcoholism, depression) are believed to follow a positively skewed…
Descriptors: Robustness (Statistics), Computational Linguistics, Item Response Theory, Psychological Patterns
Roduta Roberts, Mary; Gotch, Chad M.; Cook, Megan; Werther, Karin; Chao, Iris C. I. – Measurement: Interdisciplinary Research and Perspectives, 2022
Performance-based assessment is a common approach to assess the development and acquisition of practice competencies among health professions students. Judgments related to the quality of performance are typically operationalized as ratings against success criteria specified within a rubric. The extent to which the rubric is understood,…
Descriptors: Protocol Analysis, Scoring Rubrics, Interviews, Performance Based Assessment
An, Lily Shiao; Ho, Andrew Dean; Davis, Laurie Laughlin – Educational Measurement: Issues and Practice, 2022
Technical documentation for educational tests focuses primarily on properties of individual scores at single points in time. Reliability, standard errors of measurement, item parameter estimates, fit statistics, and linking constants are standard technical features that external stakeholders use to evaluate items and individual scale scores.…
Descriptors: Documentation, Scores, Evaluation Methods, Longitudinal Studies
Weidlich, Joshua; Gaševic, Dragan; Drachsler, Hendrik – Journal of Learning Analytics, 2022
As a research field geared toward understanding and improving learning, Learning Analytics (LA) must be able to provide empirical support for causal claims. However, as a highly applied field, tightly controlled randomized experiments are not always feasible nor desirable. Instead, researchers often rely on observational data, based on which they…
Descriptors: Causal Models, Inferences, Learning Analytics, Comparative Analysis
Amrein-Beardsley, Audrey; Sloat, Edward; Holloway, Jessica – AASA Journal of Scholarship & Practice, 2020
In this study, researchers compared the concordance of teacher-level effectiveness ratings derived via six common generalized value-added model (VAM) approaches including a (1) student growth percentile (SGP) model, (2) value-added linear regression model (VALRM), (3) value-added hierarchical linear model (VAHLM), (4) simple difference (gain)…
Descriptors: Value Added Models, Teacher Effectiveness, Elementary School Teachers, Teacher Evaluation
Mohammed, Aisha; Dawood, Abdul Kareem Shareef; Alghazali, Tawfeeq; Kadhim, Qasim Khlaif; Sabti, Ahmed Abdulateef; Sabit, Shaker Holh – International Journal of Language Testing, 2023
Cognitive diagnostic models (CDMs) have received much interest within the field of language testing over the last decade due to their great potential to provide diagnostic feedback to all stakeholders and ultimately improve language teaching and learning. A large number of studies have demonstrated the application of CDMs on advanced large-scale…
Descriptors: Reading Comprehension, Reading Tests, Language Tests, English (Second Language)
Khamboonruang, Apichat – rEFLections, 2022
Although much research has compared the functioning between analytic and holistic rating scales, little research has compared the functioning of binary rating scales with other types of rating scales. This quantitative study set out to preliminarily and comparatively validate binary and analytic rating scales intended for use in formative…
Descriptors: Writing Evaluation, Evaluation Methods, Second Language Learning, Second Language Instruction
Tavares, Walter; Brydges, Ryan; Myre, Paul; Prpic, Jason; Turner, Linda; Yelle, Richard; Huiskamp, Maud – Advances in Health Sciences Education, 2018
Assessment of clinical competence is complex and inference based. Trustworthy and defensible assessment processes must have favourable evidence of validity, particularly where decisions are considered high stakes. We aimed to organize, collect and interpret validity evidence for a high stakes simulation based assessment strategy for certifying…
Descriptors: Competence, Simulation, Allied Health Personnel, Certification
Xi, Xiaoming – Language Testing, 2017
In recent years, continuing advances in technology have increased the capacity to automate the extraction of a range of linguistic features of texts and thus have provided the impetus for the substantial growth of corpus linguistics. While corpus linguistic tools and methods have been used extensively in second language learning research, they…
Descriptors: Computational Linguistics, Second Language Learning, Language Tests, Evaluation Methods
Wing, Coady; Bello-Gomez, Ricardo A. – American Journal of Evaluation, 2018
Treatment effect estimates from a "regression discontinuity design" (RDD) have high internal validity. However, the arguments that support the design apply to a subpopulation that is narrower and usually different from the population of substantive interest in evaluation research. The disconnect between RDD population and the…
Descriptors: Regression (Statistics), Research Design, Validity, Evaluation Methods
Ercikan, Kadriye; Oliveri, María Elena – Applied Measurement in Education, 2016
Assessing complex constructs such as those discussed under the umbrella of 21st century constructs highlights the need for a principled assessment design and validation approach. In our discussion, we made a case for three considerations: (a) taking construct complexity into account across various stages of assessment development such as the…
Descriptors: Evaluation Methods, Test Construction, Design, Scaling
Beauchamp, David; Constantinou, Filio – Research Matters, 2020
Assessment is a useful process as it provides various stakeholders (e.g., teachers, parents, government, employers) with information about students' competence in a particular subject area. However, for the information generated by assessment to be useful, it needs to support valid inferences. One factor that can undermine the validity of…
Descriptors: Computational Linguistics, Inferences, Validity, Language Usage