Publication Date
In 2025 | 0 |
Since 2024 | 0 |
Since 2021 (last 5 years) | 8 |
Since 2016 (last 10 years) | 104 |
Since 2006 (last 20 years) | 278 |
Descriptor
Evaluation Methods | 481 |
Statistical Analysis | 481 |
Foreign Countries | 118 |
Student Evaluation | 101 |
Test Validity | 96 |
Test Reliability | 87 |
Comparative Analysis | 77 |
Test Construction | 75 |
Scores | 73 |
Academic Achievement | 62 |
Models | 61 |
More ▼ |
Source
Author
Gill, Brian | 3 |
Hambleton, Ronald K. | 3 |
Robitzsch, Alexander | 3 |
Bobbett, Gordon | 2 |
Booker, Kevin | 2 |
Braun, Henry | 2 |
Briggs, Derek C. | 2 |
Bruch, Julie | 2 |
Burson, William W. | 2 |
Castilla-Earls, Anny | 2 |
DeMars, Christine E. | 2 |
More ▼ |
Publication Type
Education Level
Location
United Kingdom | 10 |
California | 9 |
Turkey | 7 |
Florida | 6 |
Iran | 6 |
Netherlands | 6 |
Pennsylvania | 6 |
Australia | 5 |
Germany | 5 |
Indiana | 5 |
Ohio | 5 |
More ▼ |
Laws, Policies, & Programs
Elementary and Secondary… | 10 |
No Child Left Behind Act 2001 | 3 |
Individuals with Disabilities… | 2 |
Elementary and Secondary… | 1 |
Elementary and Secondary… | 1 |
Individuals with Disabilities… | 1 |
Race to the Top | 1 |
Assessments and Surveys
What Works Clearinghouse Rating
Practices in Instrument Use and Development in "Chemistry Education Research and Practice" 2010-2021
Lazenby, Katherine; Tenney, Kristin; Marcroft, Tina A.; Komperda, Regis – Chemistry Education Research and Practice, 2023
Assessment instruments that generate quantitative data on attributes (cognitive, affective, behavioral, "etc.") of participants are commonly used in the chemistry education community to draw conclusions in research studies or inform practice. Recently, articles and editorials have stressed the importance of providing evidence for the…
Descriptors: Chemistry, Periodicals, Journal Articles, Science Education
Heine, Jörg-Henrik; Robitzsch, Alexander – Large-scale Assessments in Education, 2022
Research Question: This paper examines the overarching question of to what extent different analytic choices may influence the inference about country-specific cross-sectional and trend estimates in international large-scale assessments. We take data from the assessment of PISA mathematics proficiency from the four rounds from 2003 to 2012 as a…
Descriptors: Foreign Countries, International Assessment, Achievement Tests, Secondary School Students
Ulrich, Monika – ProQuest LLC, 2023
The National Council of Teachers of Mathematics (NCTM), has made an effort to increase the use of technology and the use of calculators into the classroom and curriculum. As a result, many studies and articles have been written on the subject of calculator use in the classroom. A review of over 600 studies revealed that it is not curricula that…
Descriptors: Calculators, Educational Technology, Technology Uses in Education, Mathematics Education
Garcia-Garzon, Eduardo; Abad, Francisco J.; Garrido, Luis E. – Journal of Intelligence, 2019
There has been increased interest in assessing the quality and usefulness of short versions of the Raven's Progressive Matrices. A recent proposal, composed of the last twelve matrices of the Standard Progressive Matrices (SPM-LS), has been depicted as a valid measure of "g." Nonetheless, the results provided in the initial validation…
Descriptors: Intelligence Tests, Test Validity, Evaluation Methods, Undergraduate Students
Ford, Jeremy W.; Conoyer, Sarah J.; Lembke, Erica S.; Smith, R. Alex; Hosp, John L. – Assessment for Effective Intervention, 2018
In the present study, two types of curriculum-based measurement (CBM) tools in science, Vocabulary Matching (VM) and Statement Verification for Science (SV-S), a modified Sentence Verification Technique, were compared. Specifically, this study aimed to determine whether the format of information presented (i.e., SV-S vs. VM) produces differences…
Descriptors: Curriculum Based Assessment, Evaluation Methods, Measurement Techniques, Comparative Analysis
Bonifay, Wes; Depaoli, Sarah – Prevention Science, 2023
Statistical analysis of categorical data often relies on multiway contingency tables; yet, as the number of categories and/or variables increases, the number of table cells with few (or zero) observations also increases. Unfortunately, sparse contingency tables invalidate the use of standard goodness-of-fit statistics. Limited-information fit…
Descriptors: Bayesian Statistics, Programming Languages, Psychopathology, Classification
Ganzfried, Sam; Yusuf, Farzana – Education Sciences, 2018
A problem faced by many instructors is that of designing exams that accurately assess the abilities of the students. Typically, these exams are prepared several days in advance, and generic question scores are used based on rough approximation of the question difficulty and length. For example, for a recent class taught by the author, there were…
Descriptors: Weighted Scores, Test Construction, Student Evaluation, Multiple Choice Tests
Raczynski, Kevin; Cohen, Allan – Applied Measurement in Education, 2018
The literature on Automated Essay Scoring (AES) systems has provided useful validation frameworks for any assessment that includes AES scoring. Furthermore, evidence for the scoring fidelity of AES systems is accumulating. Yet questions remain when appraising the scoring performance of AES systems. These questions include: (a) which essays are…
Descriptors: Essay Tests, Test Scoring Machines, Test Validity, Evaluators
Huggins-Manley, Anne Corinne – Educational and Psychological Measurement, 2017
This study defines subpopulation item parameter drift (SIPD) as a change in item parameters over time that is dependent on subpopulations of examinees, and hypothesizes that the presence of SIPD in anchor items is associated with bias and/or lack of invariance in three psychometric outcomes. Results show that SIPD in anchor items is associated…
Descriptors: Psychometrics, Test Items, Item Response Theory, Hypothesis Testing
Ozsoy, Seyma Nur; Kilmen, Sevilay – International Journal of Assessment Tools in Education, 2023
In this study, Kernel test equating methods were compared under NEAT and NEC designs. In NEAT design, Kernel post-stratification and chain equating methods taking into account optimal and large bandwidths were compared. In the NEC design, gender and/or computer/tablet use was considered as a covariate, and Kernel test equating methods were…
Descriptors: Equated Scores, Testing, Test Items, Statistical Analysis
Metsämuuronen, Jari – International Journal of Educational Methodology, 2020
A new index of item discrimination power (IDP), dimension-corrected Somers' D (D2) is proposed. Somers' D is one of the superior alternatives for item-total- (Rit) and item-rest correlation (Rir) in reflecting the real IDP with items with scales 0/1 and 0/1/2, that is, up to three categories. D also reaches the extreme value +1 and -1 correctly…
Descriptors: Item Analysis, Correlation, Test Items, Simulation
Varela, Otmar; Mead, Esther – Journal of Education for Business, 2018
Popular teamwork assessments have been strongly criticized on the grounds of poor psychometric properties and their disconnect with conceptual models of teamwork. These issues raise concerns with respect to our ability to evaluate efforts devoted to advancing teamwork in academia. We report the development of a teamwork assessment that builds on…
Descriptors: Teamwork, Evaluation Methods, Test Validity, Psychometrics
Todd, Amber; Romine, William L.; Cook Whitt, Katahdin – Science Education, 2017
We describe the development, validation, and use of the "Learning Progression-Based Assessment of Modern Genetics" (LPA-MG) in a high school biology context. Items were constructed based on a current learning progression framework for genetics (Shea & Duncan, 2013; Todd & Kenyon, 2015). The 34-item instrument, which was tied to…
Descriptors: Genetics, Science Instruction, High School Students, Evaluation Methods
Dirlikov, Benjamin; Younes, Laurent; Nebel, Mary Beth; Martinelli, Mary Katherine; Tiedemann, Alyssa Nicole; Koch, Carolyn A.; Fiorilli, Diana; Bastian, Amy J.; Denckla, Martha Bridge; Miller, Michael I.; Mostofsky, Stewart H. – Journal of Occupational Therapy, Schools & Early Intervention, 2017
This study presents construct validity for a novel automated morphometric and kinematic handwriting assessment, including (1) convergent validity, establishing reliability of automated measures with traditional manual-derived Minnesota Handwriting Assessment (MHA), and (2) discriminant validity, establishing that the automated methods distinguish…
Descriptors: Handwriting, Evaluation Methods, Children, Preadolescents
Borda, Emily; Haskell, Todd; Todd, Andrew – Journal of College Science Teaching, 2022
We propose cross-disciplinary learning as a construct that can guide instruction and assessment in programs that feature sequential learning across multiple science disciplines. Crossdisciplinary learning combines insights from interdisciplinary learning, transfer, and resources frameworks and highlights the processes of resource activation,…
Descriptors: Interdisciplinary Approach, Multiple Choice Tests, Protocol Analysis, Evaluation Methods