Publication Date
In 2025 | 1 |
Since 2024 | 2 |
Since 2021 (last 5 years) | 4 |
Since 2016 (last 10 years) | 6 |
Since 2006 (last 20 years) | 8 |
Descriptor
Source
Educational Measurement:… | 17 |
Author
Publication Type
Journal Articles | 17 |
Reports - Research | 7 |
Reports - Descriptive | 4 |
Opinion Papers | 3 |
Reports - Evaluative | 3 |
Information Analyses | 1 |
Speeches/Meeting Papers | 1 |
Education Level
Higher Education | 1 |
Junior High Schools | 1 |
Middle Schools | 1 |
Secondary Education | 1 |
Audience
Location
California | 1 |
Ohio | 1 |
Laws, Policies, & Programs
Assessments and Surveys
SAT (College Admission Test) | 1 |
What Works Clearinghouse Rating
Peabody, Michael R.; Muckle, Timothy J.; Meng, Yu – Educational Measurement: Issues and Practice, 2023
The subjective aspect of standard-setting is often criticized, yet data-driven standard-setting methods are rarely applied. Therefore, we applied a mixture Rasch model approach to setting performance standards across several testing programs of various sizes and compared the results to existing passing standards derived from traditional…
Descriptors: Item Response Theory, Standard Setting, Testing, Sampling
Guher Gorgun; Okan Bulut – Educational Measurement: Issues and Practice, 2025
Automatic item generation may supply many items instantly and efficiently to assessment and learning environments. Yet, the evaluation of item quality persists to be a bottleneck for deploying generated items in learning and assessment settings. In this study, we investigated the utility of using large-language models, specifically Llama 3-8B, for…
Descriptors: Artificial Intelligence, Quality Control, Technology Uses in Education, Automation
An, Lily Shiao; Ho, Andrew Dean; Davis, Laurie Laughlin – Educational Measurement: Issues and Practice, 2022
Technical documentation for educational tests focuses primarily on properties of individual scores at single points in time. Reliability, standard errors of measurement, item parameter estimates, fit statistics, and linking constants are standard technical features that external stakeholders use to evaluate items and individual scale scores.…
Descriptors: Documentation, Scores, Evaluation Methods, Longitudinal Studies
Angela Johnson; Elizabeth Barker; Marcos Viveros Cespedes – Educational Measurement: Issues and Practice, 2024
Educators and researchers strive to build policies and practices on data and evidence, especially on academic achievement scores. When assessment scores are inaccurate for specific student populations or when scores are inappropriately used, even data-driven decisions will be misinformed. To maximize the impact of the research-practice-policy…
Descriptors: Equal Education, Inclusion, Evaluation Methods, Error of Measurement
Castellano, Katherine E.; McCaffrey, Daniel F. – Educational Measurement: Issues and Practice, 2017
Mean or median student growth percentiles (MGPs) are a popular measure of educator performance, but they lack rigorous evaluation. This study investigates the error in MGP due to test score measurement error (ME). Using analytic derivations, we find that errors in the commonly used MGP are correlated with average prior latent achievement: Teachers…
Descriptors: Teacher Evaluation, Teacher Effectiveness, Value Added Models, Achievement Gains
Furtak, Erin Marie; Ruiz-Primo, Maria Araceli; Bakeman, Roger – Educational Measurement: Issues and Practice, 2017
Formative assessment is a classroom practice that has received much attention in recent years for its established potential at increasing student learning. A frequent analytic approach for determining the quality of formative assessment practices is to develop a coding scheme and determine frequencies with which the codes are observed; however,…
Descriptors: Sequential Approach, Formative Evaluation, Alternative Assessment, Incidence
Lu, Ying; Sireci, Stephen G. – Educational Measurement: Issues and Practice, 2007
Speededness refers to the situation where the time limits on a standardized test do not allow substantial numbers of examinees to fully consider all test items. When tests are not intended to measure speed of responding, speededness introduces a severe threat to the validity of interpretations based on test scores. In this article, we describe…
Descriptors: Test Items, Timed Tests, Standardized Tests, Test Validity
Gorin, Joanna S. – Educational Measurement: Issues and Practice, 2006
One of the primary themes of the National Research Council's 2001 book "Knowing What Students Know" was the importance of cognition as a component of assessment design and measurement theory (NRC, 2001). One reaction to the book has been an increased use of sophisticated statistical methods to model cognitive information available in test data.…
Descriptors: Test Construction, Student Evaluation, Academic Ability, Evaluation Methods
Haladyna, Thomas M.; Downing, Steven M. – Educational Measurement: Issues and Practice, 2004
There are many threats to validity in high-stakes achievement testing. One major threat is construct-irrelevant variance (CIV). This article defines CIV in the context of the contemporary, unitary view of validity and presents logical arguments, hypotheses, and documentation for a variety of CIV sources that commonly threaten interpretations of…
Descriptors: Student Evaluation, Evaluation Methods, High Stakes Tests, Construct Validity

Nitko, Anthony J. – Educational Measurement: Issues and Practice, 1995
If curriculum is to be the basis for assessment reform, assessment specialists must model the process for producing valid assessment products. Validity criteria should guide any model for the assessment development process. However, curriculum-based assessment systems should not be confused with standards-driven assessment systems. (SLD)
Descriptors: Criteria, Curriculum Based Assessment, Educational Change, Evaluation Methods

Shepard, Lorrie A. – Educational Measurement: Issues and Practice, 1997
It is argued that consequences are a logical part of the evaluation of test use, which has been an accepted part of test validity for several decades. The examination of effects following from test use is essential in evaluating test validity and not merely the domain of policymakers and politicians. (SLD)
Descriptors: Educational Assessment, Educational Policy, Educational Testing, Elementary Secondary Education

Linn, Robert L. – Educational Measurement: Issues and Practice, 1997
It is argued that consequential validity is a concept worth considering. The solution to defining "validity" is not to narrow the concept, but to allow for the differential prediction provided by tests in different circumstances. Consequences of the uses and interpretations of test scores are central to their evaluation. (SLD)
Descriptors: Educational Assessment, Educational Testing, Elementary Secondary Education, Evaluation Methods
Chester, Mitchell D. – Educational Measurement: Issues and Practice, 2005
This study explores the use of multiple measures to enhance the validity and reliability of inferences about school and district effectiveness. Using data from the state of Ohio, a framework for combining measures is applied to examine the individual and collective impact of multiple measures on both the federal AYP designations and state ratings.…
Descriptors: Inferences, Educational Improvement, School Effectiveness, Accountability

Shepard, Lorrie A. – Educational Measurement: Issues and Practice, 1990
Results of a 1987 report indicating that elementary students of all states were above the national average are assessed. Issues addressed include teaching for standardized tests, the effect of teaching on national norms, and alternatives available to protect the integrity of instruction and the validity of normed test scores. (TJH)
Descriptors: Achievement Tests, Elementary Education, Evaluation Methods, National Norms
Zwick, Rebecca; Schlemer, Lizabeth – Educational Measurement: Issues and Practice, 2004
The validity of the SAT as an admissions criterion for Latinos and Asian Americans who are not native English speakers was examined. The analyses, based on 1997 and 1998 UCSB freshmen, focused on the effectiveness of SAT scores and high school grade-point average (HSGPA) in predicting college freshman grade-point average (FGPA). When regression…
Descriptors: Test Validity, Language Minorities, Asian American Students, Hispanic American Students
Previous Page | Next Page ยป
Pages: 1 | 2