Publication Date
In 2025 | 3 |
Since 2024 | 3 |
Since 2021 (last 5 years) | 11 |
Since 2016 (last 10 years) | 22 |
Since 2006 (last 20 years) | 25 |
Descriptor
Evaluation Methods | 30 |
Cutting Scores | 8 |
Test Items | 7 |
Test Validity | 7 |
Standard Setting (Scoring) | 5 |
Comparative Analysis | 4 |
Performance Based Assessment | 4 |
Scores | 4 |
Test Interpretation | 4 |
Testing Problems | 4 |
Validity | 4 |
More ▼ |
Source
Educational Measurement:… | 30 |
Author
Wyse, Adam E. | 4 |
Babcock, Ben | 2 |
Sireci, Stephen G. | 2 |
Wind, Stefanie A. | 2 |
Abedi, Jamal | 1 |
An, Lily Shiao | 1 |
Aray, Henry | 1 |
Bakeman, Roger | 1 |
Baldwin, Peter | 1 |
Baron, Patricia | 1 |
Burkett, Ruth S. | 1 |
More ▼ |
Publication Type
Journal Articles | 30 |
Reports - Research | 30 |
Information Analyses | 1 |
Reports - Descriptive | 1 |
Education Level
Audience
Location
Florida | 1 |
Idaho | 1 |
New Hampshire | 1 |
Washington | 1 |
Wisconsin | 1 |
Laws, Policies, & Programs
No Child Left Behind Act 2001 | 1 |
Assessments and Surveys
Progress in International… | 1 |
What Works Clearinghouse Rating
Yangmeng Xu; Stefanie A. Wind – Educational Measurement: Issues and Practice, 2025
Double-scoring constructed-response items is a common but costly practice in mixed-format assessments. This study explored the impacts of Targeted Double-Scoring (TDS) and random double-scoring procedures on the quality of psychometric outcomes, including student achievement estimates, person fit, and student classifications under various…
Descriptors: Academic Achievement, Psychometrics, Scoring, Evaluation Methods
Reese Butterfuss; Harold Doran – Educational Measurement: Issues and Practice, 2025
Large language models are increasingly used in educational and psychological measurement activities. Their rapidly evolving sophistication and ability to detect language semantics make them viable tools to supplement subject matter experts and their reviews of large amounts of text statements, such as educational content standards. This paper…
Descriptors: Alignment (Education), Academic Standards, Content Analysis, Concept Mapping
Peabody, Michael R.; Muckle, Timothy J.; Meng, Yu – Educational Measurement: Issues and Practice, 2023
The subjective aspect of standard-setting is often criticized, yet data-driven standard-setting methods are rarely applied. Therefore, we applied a mixture Rasch model approach to setting performance standards across several testing programs of various sizes and compared the results to existing passing standards derived from traditional…
Descriptors: Item Response Theory, Standard Setting, Testing, Sampling
Lewis, Jennifer; Lim, Hwanggyu; Padellaro, Frank; Sireci, Stephen G.; Zenisky, April L. – Educational Measurement: Issues and Practice, 2022
Setting cut scores on (MSTs) is difficult, particularly when the test spans several grade levels, and the selection of items from MST panels must reflect the operational test specifications. In this study, we describe, illustrate, and evaluate three methods for mapping panelists' Angoff ratings into cut scores on the scale underlying an MST. The…
Descriptors: Cutting Scores, Adaptive Testing, Test Items, Item Analysis
Guher Gorgun; Okan Bulut – Educational Measurement: Issues and Practice, 2025
Automatic item generation may supply many items instantly and efficiently to assessment and learning environments. Yet, the evaluation of item quality persists to be a bottleneck for deploying generated items in learning and assessment settings. In this study, we investigated the utility of using large-language models, specifically Llama 3-8B, for…
Descriptors: Artificial Intelligence, Quality Control, Technology Uses in Education, Automation
Ji, Xuejun Ryan; Wu, Amery D. – Educational Measurement: Issues and Practice, 2023
The Cross-Classified Mixed Effects Model (CCMEM) has been demonstrated to be a flexible framework for evaluating reliability by measurement specialists. Reliability can be estimated based on the variance components of the test scores. Built upon their accomplishment, this study extends the CCMEM to be used for evaluating validity evidence.…
Descriptors: Measurement, Validity, Reliability, Models
Baron, Patricia; Sireci, Stephen G.; Slater, Sharon C. – Educational Measurement: Issues and Practice, 2021
Since the No Child Left Behind Act (No Child Left Behind [NCLB], 2001) was enacted, the Bookmark method has been used in many state standard setting studies (Karantonis and Sireci; Zieky, Perie, and Livingston). The purpose of the current study is to evaluate the criticism that when panelists are presented with data during the Bookmark standard…
Descriptors: State Standards, Standard Setting, Evaluators, Training
Johnson, Evelyn S.; Crawford, Angela R.; Zheng, Yuzhu; Moylan, Laura A. – Educational Measurement: Issues and Practice, 2021
In this study, we compared the results of 27 special education teachers' evaluations using two different observation instruments, the Framework for Teaching (FFT), and the Explicit Instruction observation protocol of the Recognizing Effective Special Education Teachers (RESET) observation system. Results indicate differences in the rank-ordering…
Descriptors: Special Education Teachers, Teacher Evaluation, Teacher Effectiveness, Evaluation Methods
Baldwin, Peter; Margolis, Melissa J.; Clauser, Brian E.; Mee, Janet; Winward, Marcia – Educational Measurement: Issues and Practice, 2020
Evidence of the internal consistency of standard-setting judgments is a critical part of the validity argument for tests used to make classification decisions. The bookmark standard-setting procedure is a popular approach to establishing performance standards, but there is relatively little research that reflects on the internal consistency of the…
Descriptors: Standard Setting (Scoring), Probability, Cutting Scores, Evaluation Methods
Wyse, Adam E. – Educational Measurement: Issues and Practice, 2020
One commonly used compromise standard-setting method is the Beuk (1984) method. A key assumption of the Beuk method is that the emphasis given to the pass rate and the percent correct ratings should be proportional to the extent that the panelists agree on their ratings. However, whether the slope of Beuk line reflects the emphasis that panelists…
Descriptors: Standard Setting (Scoring), Cutting Scores, Weighted Scores, Evaluation Methods
Wind, Stefanie A. – Educational Measurement: Issues and Practice, 2020
Researchers have documented the impact of rater effects, or raters' tendencies to give different ratings than would be expected given examinee achievement levels, in performance assessments. However, the degree to which rater effects influence person fit, or the reasonableness of test-takers' achievement estimates given their response patterns,…
Descriptors: Performance Based Assessment, Evaluators, Achievement, Influences
An, Lily Shiao; Ho, Andrew Dean; Davis, Laurie Laughlin – Educational Measurement: Issues and Practice, 2022
Technical documentation for educational tests focuses primarily on properties of individual scores at single points in time. Reliability, standard errors of measurement, item parameter estimates, fit statistics, and linking constants are standard technical features that external stakeholders use to evaluate items and individual scale scores.…
Descriptors: Documentation, Scores, Evaluation Methods, Longitudinal Studies
Wyse, Adam E.; Babcock, Ben – Educational Measurement: Issues and Practice, 2020
A common belief is that the Bookmark method is a cognitively simpler standard-setting method than the modified Angoff method. However, a limited amount of research has investigated panelist's ability to perform well the Bookmark method, and whether some of the challenges panelists face with the Angoff method may also be present in the Bookmark…
Descriptors: Standard Setting (Scoring), Evaluation Methods, Testing Problems, Test Items
Van Norman, Ethan R. – Educational Measurement: Issues and Practice, 2023
Sophisticated analytic strategies have been proposed as viable methods to improve the quantification of student improvement and to assist educators in making treatment decisions. The performance of three categories of latent growth modeling techniques (linear, quadratic, and dual change) to capture growth in oral reading fluency in response to a…
Descriptors: Evaluation Methods, Student Reaction, Teaching Methods, Structural Equation Models
Lewis, Daniel; Cook, Robert – Educational Measurement: Issues and Practice, 2020
In this paper we assert that the practice of principled assessment design renders traditional standard-setting methodology redundant at best and contradictory at worst. We describe the rationale for, and methodological details of, Embedded Standard Setting (ESS; previously, Engineered Cut Scores. Lewis, 2016), an approach to establish performance…
Descriptors: Standard Setting, Evaluation, Cutting Scores, Performance Based Assessment
Previous Page | Next Page ยป
Pages: 1 | 2