Publication Date
In 2025 | 0 |
Since 2024 | 0 |
Since 2021 (last 5 years) | 5 |
Since 2016 (last 10 years) | 14 |
Since 2006 (last 20 years) | 24 |
Descriptor
Achievement Tests | 44 |
Evaluation Methods | 44 |
Test Items | 44 |
Test Construction | 20 |
Foreign Countries | 17 |
Student Evaluation | 13 |
Elementary Secondary Education | 12 |
Comparative Analysis | 11 |
Test Validity | 10 |
Item Analysis | 9 |
Test Bias | 9 |
More ▼ |
Source
Author
Abedi, Jamal | 2 |
Robitzsch, Alexander | 2 |
Adom, Dickson | 1 |
Ainley, John | 1 |
Alexander, Patricia A. | 1 |
Algozzine, Bob | 1 |
Anthony Petrosino | 1 |
Bachman, Patina L. | 1 |
Baker, Eva L. | 1 |
Barry, Carol | 1 |
Bauer, Scott C. | 1 |
More ▼ |
Publication Type
Education Level
Secondary Education | 7 |
Elementary Secondary Education | 5 |
Grade 8 | 4 |
Grade 4 | 2 |
Grade 9 | 2 |
Adult Education | 1 |
Elementary Education | 1 |
Grade 10 | 1 |
Grade 5 | 1 |
Grade 6 | 1 |
Grade 7 | 1 |
More ▼ |
Audience
Practitioners | 4 |
Teachers | 2 |
Researchers | 1 |
Laws, Policies, & Programs
Assessments and Surveys
Program for International… | 6 |
Trends in International… | 4 |
National Assessment of… | 3 |
Florida Comprehensive… | 1 |
Progress in International… | 1 |
SAT (College Admission Test) | 1 |
State of Texas Assessments of… | 1 |
What Works Clearinghouse Rating
Gill, Tim – Research Matters, 2022
In Comparative Judgement (CJ) exercises, examiners are asked to look at a selection of candidate scripts (with marks removed) and order them in terms of which they believe display the best quality. By including scripts from different examination sessions, the results of these exercises can be used to help with maintaining standards. Results from…
Descriptors: Comparative Analysis, Decision Making, Scripts, Standards
Toker, Turker – International Journal of Curriculum and Instruction, 2023
Achievement tests are among the most widely used data collection tools to measure the knowledge and skill levels of individuals. For this reason, the existence of valid and reliable achievement tests that can perfectly reveal the competencies that a person should have in any discipline is of great importance. The purpose of this research is to…
Descriptors: Basic Skills, Evaluation Methods, Test Items, Test Validity
Adom, Dickson; Mensah, Jephtar Adu; Dake, Dennis Atsu – International Journal of Evaluation and Research in Education, 2020
Test, measurement, and evaluation are concepts used in education to explain how the progress of learning and the final learning outcomes of students are assessed. However, the terms are often misused in the field of education, especially in Ghana. The objective of the study was to thoroughly explain the concepts to assist educationists and…
Descriptors: Foreign Countries, Educational Research, Evaluation Methods, Measurement Techniques
Walter M. Stroup; Anthony Petrosino; Corey Brady; Karen Duseau – North American Chapter of the International Group for the Psychology of Mathematics Education, 2023
Tests of statistical significance often play a decisive role in establishing the empirical warrant of evidence-based research in education. The results from pattern-based assessment items, as introduced in this paper, are categorical and multimodal and do not immediately support the use of measures of central tendency as typically related to…
Descriptors: Statistical Significance, Comparative Analysis, Research Methodology, Evaluation Methods
Heine, Jörg-Henrik; Robitzsch, Alexander – Large-scale Assessments in Education, 2022
Research Question: This paper examines the overarching question of to what extent different analytic choices may influence the inference about country-specific cross-sectional and trend estimates in international large-scale assessments. We take data from the assessment of PISA mathematics proficiency from the four rounds from 2003 to 2012 as a…
Descriptors: Foreign Countries, International Assessment, Achievement Tests, Secondary School Students
A Sequential Bayesian Changepoint Detection Procedure for Aberrant Behaviors in Computerized Testing
Jing Lu; Chun Wang; Jiwei Zhang; Xue Wang – Grantee Submission, 2023
Changepoints are abrupt variations in a sequence of data in statistical inference. In educational and psychological assessments, it is pivotal to properly differentiate examinees' aberrant behaviors from solution behavior to ensure test reliability and validity. In this paper, we propose a sequential Bayesian changepoint detection algorithm to…
Descriptors: Bayesian Statistics, Behavior Patterns, Computer Assisted Testing, Accuracy
Kaplan, David; Su, Dan – Large-scale Assessments in Education, 2018
Background: This paper extends a recent study by Kaplan and Su ("J Educ Behav Stat" 41: 51-80, 2016) examining the problem of matrix sampling of context questionnaire scales with respect to the generation of plausible values of cognitive outcomes in large-scale assessments. Methods: Following Weirich et al. ("Nested multiple…
Descriptors: Questionnaires, Measurement, Measurement Techniques, Evaluation Methods
Beauchamp, David; Constantinou, Filio – Research Matters, 2020
Assessment is a useful process as it provides various stakeholders (e.g., teachers, parents, government, employers) with information about students' competence in a particular subject area. However, for the information generated by assessment to be useful, it needs to support valid inferences. One factor that can undermine the validity of…
Descriptors: Computational Linguistics, Inferences, Validity, Language Usage
Huggins-Manley, Anne Corinne – Educational and Psychological Measurement, 2017
This study defines subpopulation item parameter drift (SIPD) as a change in item parameters over time that is dependent on subpopulations of examinees, and hypothesizes that the presence of SIPD in anchor items is associated with bias and/or lack of invariance in three psychometric outcomes. Results show that SIPD in anchor items is associated…
Descriptors: Psychometrics, Test Items, Item Response Theory, Hypothesis Testing
George, Ann Cathrice; Robitzsch, Alexander – Applied Measurement in Education, 2018
This article presents a new perspective on measuring gender differences in the large-scale assessment study Trends in International Science Study (TIMSS). The suggested empirical model is directly based on the theoretical competence model of the domain mathematics and thus includes the interaction between content and cognitive sub-competencies.…
Descriptors: Achievement Tests, Elementary Secondary Education, Mathematics Achievement, Mathematics Tests
Rutkowski, Leslie; Rutkowski, David; Zhou, Yan – International Journal of Testing, 2016
Using an empirically-based simulation study, we show that typically used methods of choosing an item calibration sample have significant impacts on achievement bias and system rankings. We examine whether recent PISA accommodations, especially for lower performing participants, can mitigate some of this bias. Our findings indicate that standard…
Descriptors: Simulation, International Programs, Adolescents, Student Evaluation
Liu, Yan; Zumbo, Bruno D.; Gustafson, Paul; Huang, Yi; Kroc, Edward; Wu, Amery D. – Practical Assessment, Research & Evaluation, 2016
A variety of differential item functioning (DIF) methods have been proposed and used for ensuring that a test is fair to all test takers in a target population in the situations of, for example, a test being translated to other languages. However, once a method flags an item as DIF, it is difficult to conclude that the grouping variable (e.g.,…
Descriptors: Test Items, Test Bias, Probability, Scores
Alexander, Patricia A.; Dumas, Denis; Grossnickle, Emily M.; List, Alexandra; Firetto, Carla M. – Journal of Experimental Education, 2016
Relational reasoning is the foundational cognitive ability to discern meaningful patterns within an informational stream, but its reliable and valid measurement remains problematic. In this investigation, the measurement of relational reasoning unfolded in three stages. Stage 1 entailed the establishment of a research-based conceptualization of…
Descriptors: Cognitive Ability, Logical Thinking, Thinking Skills, Cognitive Processes
Sachse, Karoline A.; Roppelt, Alexander; Haag, Nicole – Journal of Educational Measurement, 2016
Trend estimation in international comparative large-scale assessments relies on measurement invariance between countries. However, cross-national differential item functioning (DIF) has been repeatedly documented. We ran a simulation study using national item parameters, which required trends to be computed separately for each country, to compare…
Descriptors: Comparative Analysis, Measurement, Test Bias, Simulation
Cresswell, John; Schwantner, Ursula; Waters, Charlotte – OECD Publishing, 2015
This report reviews the major international and regional large-scale educational assessments, including international surveys, school-based surveys and household-based surveys. The report compares and contrasts the cognitive and contextual data collection instruments and implementation methods used by the different assessments in order to identify…
Descriptors: International Assessment, Educational Assessment, Data Collection, Comparative Analysis