NotesFAQContact Us
Collection
Advanced
Search Tips
What Works Clearinghouse Rating
Showing 1 to 15 of 98 results Save | Export
Peer reviewed Peer reviewed
Direct linkDirect link
James Soland – Journal of Research on Educational Effectiveness, 2024
When randomized control trials are not possible, quasi-experimental methods often represent the gold standard. One quasi-experimental method is difference-in-difference (DiD), which compares changes in outcomes before and after treatment across groups to estimate a causal effect. DiD researchers often use fairly exhaustive robustness checks to…
Descriptors: Item Response Theory, Testing, Test Validity, Intervention
Peer reviewed Peer reviewed
Direct linkDirect link
Shun-Fu Hu; Amery D. Wu; Jake Stone – Journal of Educational Measurement, 2025
Scoring high-dimensional assessments (e.g., > 15 traits) can be a challenging task. This paper introduces the multilabel neural network (MNN) as a scoring method for high-dimensional assessments. Additionally, it demonstrates how MNN can score the same test responses to maximize different performance metrics, such as accuracy, recall, or…
Descriptors: Tests, Testing, Scores, Test Construction
Peer reviewed Peer reviewed
Direct linkDirect link
Uminski, Crystal; Hubbard, Joanna K.; Couch, Brian A. – CBE - Life Sciences Education, 2023
Biology instructors use concept assessments in their courses to gauge student understanding of important disciplinary ideas. Instructors can choose to administer concept assessments based on participation (i.e., lower stakes) or the correctness of responses (i.e., higher stakes), and students can complete the assessment in an in-class or…
Descriptors: Biology, Science Tests, High Stakes Tests, Scores
Mattern, Krista; Radunzel, Justine – ACT, Inc., 2019
When applicants take the ACT® more than once, how do colleges and universities reconcile and make sense of the multiple scores? In terms of validity, fairness, and impact on subgroup differences, are certain score-use polices better than others? The focus of this issue brief is to summarize evidence on the validity and fairness of various…
Descriptors: Scoring, College Entrance Examinations, Test Validity, Evaluation Methods
Fitzgerald, Jill; Shanahan, Timothy E. – International Literacy Association, 2020
Reading scores exist for a continuum of purposes, from informal assessment to formal standardized tests. This brief aims to answer the question: What matters most for elementary-grade teachers when thinking about reading scores, and what could policymakers do to help teachers? Three positions worth pursuing in this regard are shared: (1) every…
Descriptors: Reading Achievement, Scores, Elementary School Students, Elementary School Teachers
Peer reviewed Peer reviewed
PDF on ERIC Download full text
Lynch, Sarah – Practical Assessment, Research & Evaluation, 2022
In today's digital age, tests are increasingly being delivered on computers. Many of these computer-based tests (CBTs) have been adapted from paper-based tests (PBTs). However, this change in mode of test administration has the potential to introduce construct-irrelevant variance, affecting the validity of score interpretations. Because of this,…
Descriptors: Computer Assisted Testing, Tests, Scores, Scoring
Peer reviewed Peer reviewed
Direct linkDirect link
Isbell, Daniel R.; Kremmel, Benjamin – Language Testing, 2020
Administration of high-stakes language proficiency tests has been disrupted in many parts of the world as a result of the 2019 novel coronavirus pandemic. Institutions that rely on test scores have been forced to adapt, and in many cases this means using scores from a different test, or a new online version of an existing test, that can be taken…
Descriptors: Language Tests, High Stakes Tests, Language Proficiency, Second Language Learning
Peer reviewed Peer reviewed
Direct linkDirect link
Haertel, Edward H. – Educational Psychologist, 2018
In the service of educational accountability, student achievement tests are being used to measure constructs quite unlike those envisioned by test developers. Scores are compared to cut points to create classifications like "proficient"; scores are combined over time to measure growth; student scores are aggregated to measure the…
Descriptors: Achievement Tests, Scores, Test Validity, Test Interpretation
Peer reviewed Peer reviewed
Direct linkDirect link
Raley, Sheida K.; Shogren, Karrie A.; Rifenbark, Graham G.; Anderson, Mark H.; Shaw, Leslie A. – Journal of Special Education Technology, 2020
The Self-Determination Inventory: Student Report (SDI: SR) was developed to measure the self-determination of adolescents and was recently validated for students aged 13-22 with and without disabilities across diverse racial/ethnic backgrounds. The SDI: SR is aligned Causal Agency Theory and its theoretical conceptualizations of self-determined…
Descriptors: Testing, Self Determination, Scores, Students with Disabilities
Peer reviewed Peer reviewed
Direct linkDirect link
Norris, John; Drackert, Anastasia – Language Testing, 2018
The Test of German as a Foreign Language (TestDaF) plays a critical role as a standardized test of German language proficiency. Developed and administered by the Society for Academic Study Preparation and Test Development (g.a.s.t.), TestDaF was launched in 2001 and has experienced persistent annual growth, with more than 44,000 test takers in…
Descriptors: German, Second Language Learning, Language Tests, Language Proficiency
Peer reviewed Peer reviewed
Direct linkDirect link
DiBiase-Lubrano, Mary Jo – Unterrichtspraxis/Teaching German, 2018
Language testing is an integral part of teaching and learning, yet most language faculty do not receive adequate training for developing tests (Taylor, [Taylor, L., 2009]). Most have advanced degrees in literary and cultural studies in the target language but often have insufficient training in pedagogy and assessment. This shortcoming is alarming…
Descriptors: German, Second Language Learning, Second Language Instruction, Language Tests
Burge, Bethan; Benson, Louise – National Foundation for Educational Research, 2021
Since its introduction in 2017, NFER has been contracted by Ofqual to develop, deliver and analyse the results of the National Reference Test (NRT) in English and maths. The NRT is administered annually and shows if student performance in English and maths at GCSE level has changed from year to year. The NRT results are based on analysis of data…
Descriptors: National Competency Tests, Test Results, English, Mathematics Tests
Peer reviewed Peer reviewed
Direct linkDirect link
Dickens, Rachel H.; Meisinger, Elizabeth B.; Tarar, Jessica M. – Canadian Journal of School Psychology, 2015
The Comprehensive Test of Phonological Processing-Second Edition (CTOPP-2; Wagner, Torgesen, Rashotte, & Pearson, 2013) is a norm-referenced test that measures phonological processing skills related to reading for individuals aged 4 to 24. According to its authors, the CTOPP-2 may be used to identify individuals who are markedly below their…
Descriptors: Norm Referenced Tests, Phonology, Test Format, Testing
Peer reviewed Peer reviewed
Direct linkDirect link
Behizadeh, Nadia; Engelhard, George, Jr. – Measurement: Interdisciplinary Research and Perspectives, 2015
In his focus article, Koretz (this issue) argues that accountability has become the primary function of large-scale testing in the United States. He then points out that tests being used for accountability purposes are flawed and that the high-stakes nature of these tests creates a context that encourages score inflation. Koretz is concerned about…
Descriptors: Communities of Practice, High Stakes Tests, Testing, Test Validity
Peer reviewed Peer reviewed
Direct linkDirect link
Braun, Henry – Measurement: Interdisciplinary Research and Perspectives, 2012
Paul E. Newton is to be commended for addressing as challenging a topic as the clarification of the concept of validity. The impetus for this foray is Newton's judgment that, despite decades of development, the definition and elaboration of the term test validity in the 1999 "Standards" retains sufficient ambiguity to permit, if not invite, both…
Descriptors: Educational Improvement, Test Validity, Validity, Tests
Previous Page | Next Page »
Pages: 1  |  2  |  3  |  4  |  5  |  6  |  7