NotesFAQContact Us
Collection
Advanced
Search Tips
Laws, Policies, & Programs
No Child Left Behind Act 20011
What Works Clearinghouse Rating
Showing 1 to 15 of 50 results Save | Export
Jose Antonio Mola Avila – ProQuest LLC, 2023
Accountability in education was implemented to improve poor learning outcomes by documenting and monitoring learning achievement results. In this process, external standardized achievement tests have played a central role, being the mechanism most frequently used to measure learning outcomes. However, several decades after its initial…
Descriptors: Foreign Countries, Standardized Tests, Achievement Tests, Accountability
Peer reviewed Peer reviewed
Direct linkDirect link
José Manuel Arencibia Alemán; Astrid Marie Jorde Sandsør; Henrik Daae Zachrisson; Sigrid Blömeke – Assessment in Education: Principles, Policy & Practice, 2024
Modest correlations between teacher-assigned grades and external assessments of academic achievement (r = 0.40-0.60) have led many educational stakeholders to deem grades subjective and unreliable. However, theoretical and methodological challenges, such as construct misalignment, data unavailability and sample unrepresentativeness, limit the…
Descriptors: Grades (Scholastic), Grading, Achievement Tests, Test Validity
Peer reviewed Peer reviewed
Direct linkDirect link
Willse, John T. – Measurement and Evaluation in Counseling and Development, 2017
This article provides a brief introduction to the Rasch model. Motivation for using Rasch analyses is provided. Important Rasch model concepts and key aspects of result interpretation are introduced, with major points reinforced using a simulation demonstration. Concrete guidelines are provided regarding sample size and the evaluation of items.
Descriptors: Item Response Theory, Test Results, Test Interpretation, Simulation
New York State Education Department, 2018
This technical report provides detailed information regarding the technical, statistical, and measurement attributes of the New York State Testing Program (NYSTP) for the Grades 3-8 English Language Arts (ELA) and Mathematics 2018 Operational Tests. This report includes information about test content and test development, item (i.e., individual…
Descriptors: English, Language Arts, Language Tests, Mathematics Tests
Lichtenstein, Robert – Communique, 2013
Assessment of human abilities and behaviors is enormously enhanced by the use of standardized assessment measures that yield norm-referenced scores. As school psychologists, they rely on quantitative findings to anchor their judgments about a child's developmental and educational functioning and to enhance our capacity to draw diagnostic…
Descriptors: Test Results, School Psychologists, Psychoeducational Methods, Scores
Powers, Sonya; Li, Dongmei; Suh, Hongwook; Harris, Deborah J. – ACT, Inc., 2016
ACT reporting categories and ACT Readiness Ranges are new features added to the ACT score reports starting in fall 2016. For each reporting category, the number correct score, the maximum points possible, the percent correct, and the ACT Readiness Range, along with an indicator of whether the reporting category score falls within the Readiness…
Descriptors: Scores, Classification, College Entrance Examinations, Error of Measurement
New York State Education Department, 2017
This technical report provides detailed information regarding the technical, statistical, and measurement attributes of the New York State Testing Program (NYSTP) for the Grades 3-8 English Language Arts (ELA) and Mathematics 2017 Operational Tests. This report includes information about test content and test development, item (i.e., individual…
Descriptors: English, Language Arts, Language Tests, Mathematics Tests
Peer reviewed Peer reviewed
Direct linkDirect link
Bradshaw, Jenny; Wheater, Rebecca – Research Papers in Education, 2013
This review examined a range of approaches internationally to the reporting of assessment results for individual students, with a particular focus on how results are represented, the level of detail reported and the steps taken to quantify, report and explain error and uncertainty in the results' reports or certificates given to students in a…
Descriptors: Test Reliability, Error of Measurement, High Stakes Tests, Foreign Countries
Peer reviewed Peer reviewed
Direct linkDirect link
Woods, Carol M. – Applied Psychological Measurement, 2011
Differential item functioning (DIF) occurs when an item on a test, questionnaire, or interview has different measurement properties for one group of people versus another, irrespective of true group-mean differences on the constructs being measured. This article is focused on item response theory based likelihood ratio testing for DIF (IRT-LR or…
Descriptors: Simulation, Item Response Theory, Testing, Questionnaires
Brockmann, Frank – Council of Chief State School Officers, 2011
State testing programs today are more extensive than ever, and their results are required to serve more purposes and high-stakes decisions than one might have imagined. Assessment results are used to hold schools, districts, and states accountable for student performance and to help guide a multitude of important decisions. This report describes…
Descriptors: Accuracy, Measurement, Testing, Expertise
Peer reviewed Peer reviewed
Direct linkDirect link
He, Qingping; Boyle, Andrew; Opposs, Dennis – Evaluation & Research in Education, 2011
Building on findings from existing qualitative research into public perceptions of reliability in examination results in England, a questionnaire was developed and administered to samples of teachers, students and employers to study their awareness of and opinions about various aspects of reliability quantitatively. Main findings from the study…
Descriptors: Qualitative Research, Student Evaluation, Tests, Program Effectiveness
Peer reviewed Peer reviewed
Direct linkDirect link
Tong, Ye; Kolen, Michael J. – Educational Measurement: Issues and Practice, 2010
"Scaling" is the process of constructing a score scale that associates numbers or other ordered indicators with the performance of examinees. Scaling typically is conducted to aid users in interpreting test results. This module describes different types of raw scores and scale scores, illustrates how to incorporate various sources of…
Descriptors: Test Results, Scaling, Measures (Individuals), Raw Scores
Doorey, Nancy A. – Council of Chief State School Officers, 2011
The work reported in this paper reflects a collaborative effort of many individuals representing multiple organizations. It began during a session at the October 2008 meeting of TILSA when a representative of a member state asked the group if any of their programs had experienced unexpected fluctuations in the annual state assessment scores, and…
Descriptors: Testing, Sampling, Expertise, Testing Programs
Peer reviewed Peer reviewed
Direct linkDirect link
Woods, Carol M. – Applied Psychological Measurement, 2009
Differential item functioning (DIF) occurs when items on a test or questionnaire have different measurement properties for one group of people versus another, irrespective of group-mean differences on the construct. Methods for testing DIF require matching members of different groups on an estimate of the construct. Preferably, the estimate is…
Descriptors: Test Results, Testing, Item Response Theory, Test Bias
Peer reviewed Peer reviewed
Direct linkDirect link
Raymond, Mark R.; Neustel, Sandra; Anderson, Dan – Educational Measurement: Issues and Practice, 2009
Examinees who take high-stakes assessments are usually given an opportunity to repeat the test if they are unsuccessful on their initial attempt. To prevent examinees from obtaining unfair score increases by memorizing the content of specific test items, testing agencies usually assign a different test form to repeat examinees. The use of multiple…
Descriptors: Test Results, Test Items, Testing, Aptitude Tests
Previous Page | Next Page »
Pages: 1  |  2  |  3  |  4