Publication Date
In 2025 | 4 |
Since 2024 | 7 |
Since 2021 (last 5 years) | 41 |
Since 2016 (last 10 years) | 123 |
Since 2006 (last 20 years) | 368 |
Descriptor
Correlation | 466 |
Evaluation Methods | 466 |
Scores | 111 |
Foreign Countries | 107 |
Test Validity | 107 |
Student Evaluation | 91 |
Test Reliability | 81 |
Comparative Analysis | 77 |
Academic Achievement | 55 |
Statistical Analysis | 55 |
Measurement Techniques | 52 |
More ▼ |
Source
Author
Chiang, Hanley | 7 |
Burkander, Paul | 5 |
Gill, Brian | 5 |
Hallgren, Kristin | 5 |
Herrmann, Mariesa | 5 |
Speroni, Cecilia | 5 |
Wellington, Alison | 5 |
Bridgeman, Brent | 4 |
Cheng, Liying | 3 |
Elliott, Stephen N. | 3 |
Berliner, David C. | 2 |
More ▼ |
Publication Type
Education Level
Audience
Researchers | 7 |
Teachers | 5 |
Practitioners | 1 |
Location
Australia | 13 |
Florida | 11 |
China | 10 |
United Kingdom | 10 |
United States | 10 |
Canada | 9 |
Turkey | 9 |
Arizona | 8 |
Germany | 7 |
Netherlands | 7 |
South Korea | 7 |
More ▼ |
Laws, Policies, & Programs
No Child Left Behind Act 2001 | 6 |
Elementary and Secondary… | 1 |
Head Start | 1 |
Assessments and Surveys
What Works Clearinghouse Rating
Does not meet standards | 1 |
Angela G. Cooley – ProQuest LLC, 2024
Educators rely on a conglomeration of assessment instruments to appraise, measure, monitor, and document evidence of students' academic readiness, learning progressions, acquisition of interdisciplinary knowledge, and extensive educational needs. The Mississippi Academic Assessment Program for Mathematics (MAAP-M) and the i-Ready Assessment…
Descriptors: Middle Schools, Grade 6, Grade 7, Grade 8
An, Lily Shiao; Ho, Andrew Dean; Davis, Laurie Laughlin – Educational Measurement: Issues and Practice, 2022
Technical documentation for educational tests focuses primarily on properties of individual scores at single points in time. Reliability, standard errors of measurement, item parameter estimates, fit statistics, and linking constants are standard technical features that external stakeholders use to evaluate items and individual scale scores.…
Descriptors: Documentation, Scores, Evaluation Methods, Longitudinal Studies
Guo, Wenjing; Choi, Youn-Jeng – Educational and Psychological Measurement, 2023
Determining the number of dimensions is extremely important in applying item response theory (IRT) models to data. Traditional and revised parallel analyses have been proposed within the factor analysis framework, and both have shown some promise in assessing dimensionality. However, their performance in the IRT framework has not been…
Descriptors: Item Response Theory, Evaluation Methods, Factor Analysis, Guidelines
Novak, Josip; Rebernjak, Blaž – Measurement: Interdisciplinary Research and Perspectives, 2023
A Monte Carlo simulation study was conducted to examine the performance of [alpha], [lambda]2, [lambda][subscript 4], [lambda][subscript 2], [omega][subscript T], GLB[subscript MRFA], and GLB[subscript Algebraic] coefficients. Population reliability, distribution shape, sample size, test length, and number of response categories were varied…
Descriptors: Monte Carlo Methods, Evaluation Methods, Reliability, Simulation
Gioia, Anthony R.; Ahmed, Yusra; Woods, Steven P.; Cirino, Paul T. – Reading and Writing: An Interdisciplinary Journal, 2023
There is significant overlap between reading and writing, but no known standardized measure assesses these jointly. The goal of the present study is to evaluate the properties of a novel measure, the Assessment of Writing, Self-Monitoring, and Reading (AWSM Reader), that simultaneously evaluates both reading comprehension and writing. In doing so,…
Descriptors: Reading Writing Relationship, Writing Evaluation, Self Evaluation (Individuals), Executive Function
Smith, Trevor I.; Bendjilali, Nasrine – Physical Review Physics Education Research, 2022
Several recent studies have employed item response theory (IRT) to rank incorrect responses to commonly used research-based multiple-choice assessments. These studies use Bock's nominal response model (NRM) for applying IRT to categorical (nondichotomous) data, but the response rankings only utilize half of the parameters estimated by the model.…
Descriptors: Item Response Theory, Test Items, Multiple Choice Tests, Science Tests
Eun, Barohny; Knotek, Steven E. – Research in Education, 2022
A Vygotskian approach to assessment is proposed by invoking the distinction between the development of lower and higher psychological functions. Higher psychological functions are specifically human and develop with the use of cultural tools via mediation. Accordingly, a distinction is made between tests that are based on association, which have…
Descriptors: Evaluation Methods, Sociocultural Patterns, Psychological Patterns, Teaching Methods
D'Urso, E. Damiano; Tijmstra, Jesper; Vermunt, Jeroen K.; De Roover, Kim – Educational and Psychological Measurement, 2023
Assessing the measurement model (MM) of self-report scales is crucial to obtain valid measurements of individuals' latent psychological constructs. This entails evaluating the number of measured constructs and determining which construct is measured by which item. Exploratory factor analysis (EFA) is the most-used method to evaluate these…
Descriptors: Factor Analysis, Measurement Techniques, Self Evaluation (Individuals), Psychological Patterns
Dimova, Slobodanka – Language Teaching Research Quarterly, 2022
Drawing on Glenn Fulcher's extensive work in performance-based language assessment of speaking, this paper explores the assessment of L2 speaking ability in local language testing contexts. For that purpose, I review Fulcher's influential work that highlights the relationship between the speaking construct, the task, the performance, and the…
Descriptors: Language Tests, Speech Communication, Performance Based Assessment, Second Language Learning
Williamson, Joanna – Research Matters, 2022
Providing evidence that can inform awarding is an important application of Comparative Judgement (CJ) methods in high-stakes qualifications. The process of marking scripts is not changed, but CJ methods can assist in the maintenance of standards from one series to another by informing decisions about where to place grade boundaries or cut scores.…
Descriptors: Standards, Grading, Decision Making, Comparative Analysis
Gill, Tim – Research Matters, 2022
In Comparative Judgement (CJ) exercises, examiners are asked to look at a selection of candidate scripts (with marks removed) and order them in terms of which they believe display the best quality. By including scripts from different examination sessions, the results of these exercises can be used to help with maintaining standards. Results from…
Descriptors: Comparative Analysis, Decision Making, Scripts, Standards
Henrique Mohallem Paiva; Flávia Maria Santoro; Victor Takashi Hayashi; Bianca Cassemiro Lima – IEEE Transactions on Education, 2025
Contribution: This article analyzes student assessment within a computing faculty employing a full project-based learning (PBL) approach. Examining 2078 final grades across 60 classes and periods, the study reveals a significant correlation between graded self-studies, exams, and projects. This result contributes to understanding the reliability…
Descriptors: Student Evaluation, Computer Science Education, College Faculty, Correlation
Malec, Wojciech; Krzeminska-Adamek, Malgorzata – Practical Assessment, Research & Evaluation, 2020
The main objective of the article is to compare several methods of evaluating multiple-choice options through classical item analysis. The methods subjected to examination include the tabulation of choice distribution, the interpretation of trace lines, the point-biserial correlation, the categorical analysis of trace lines, and the investigation…
Descriptors: Comparative Analysis, Evaluation Methods, Multiple Choice Tests, Item Analysis
Wang, Xiaolin; Svetina, Dubravka; Dai, Shenghai – Journal of Experimental Education, 2019
Recently, interest in test subscore reporting for diagnosis purposes has been growing rapidly. The two simulation studies here examined factors (sample size, number of subscales, correlation between subscales, and three factors affecting subscore reliability: number of items per subscale, item parameter distribution, and data generating model)…
Descriptors: Value Added Models, Scores, Sample Size, Correlation
Cai, Yuyang; Chen, Huilin – Language Assessment Quarterly, 2022
Thinking skills play a critical role in determining language performance. Recent advancement in cognitive diagnostic modelling (CDM) provides a powerful tool for obtaining fine-grained information regarding these thinking skills during reading. Studies are scant, however, exploring the relations between thinking skills and language performance,…
Descriptors: Evaluation Methods, Language Proficiency, Second Language Learning, Reading Processes