Publication Date
In 2025 | 4 |
Since 2024 | 4 |
Since 2021 (last 5 years) | 12 |
Since 2016 (last 10 years) | 38 |
Since 2006 (last 20 years) | 77 |
Descriptor
Cutting Scores | 148 |
Test Items | 148 |
Difficulty Level | 52 |
Standard Setting (Scoring) | 46 |
Test Construction | 43 |
Item Response Theory | 39 |
Item Analysis | 36 |
Test Validity | 25 |
Licensing Examinations… | 23 |
Error of Measurement | 21 |
Foreign Countries | 19 |
More ▼ |
Source
Author
Publication Type
Education Level
Higher Education | 14 |
Postsecondary Education | 13 |
Secondary Education | 12 |
Elementary Secondary Education | 10 |
Junior High Schools | 6 |
Middle Schools | 6 |
Elementary Education | 5 |
Grade 5 | 5 |
Grade 3 | 3 |
Grade 8 | 3 |
High Schools | 3 |
More ▼ |
Location
Arkansas | 2 |
California | 2 |
Germany | 2 |
New Mexico | 2 |
Turkey | 2 |
Canada | 1 |
China | 1 |
Europe | 1 |
European Union | 1 |
Jordan | 1 |
Maryland | 1 |
More ▼ |
Laws, Policies, & Programs
No Child Left Behind Act 2001 | 2 |
Education Consolidation… | 1 |
Assessments and Surveys
What Works Clearinghouse Rating
Abdolvahab Khademi; Craig S. Wells; Maria Elena Oliveri; Ester Villalonga-Olives – SAGE Open, 2023
The most common effect size when using a multiple-group confirmatory factor analysis approach to measurement invariance is [delta]CFI and [delta]TLI with a cutoff value of 0.01. However, this recommended cutoff value may not be ubiquitously appropriate and may be of limited application for some tests (e.g., measures using dichotomous items or…
Descriptors: Factor Analysis, Factor Structure, Error of Measurement, Test Items
Jerin Kim; Kent McIntosh – Journal of Positive Behavior Interventions, 2025
We aimed to identify empirically valid cut scores on the positive behavioral interventions and supports (PBIS) Tiered Fidelity Inventory (TFI) through an expert panel process known as bookmarking. The TFI is a measurement tool to evaluate the fidelity of implementation of PBIS. In the bookmark method, experts reviewed all TFI items and item scores…
Descriptors: Positive Behavior Supports, Cutting Scores, Fidelity, Program Evaluation
Metsämuuronen, Jari – Practical Assessment, Research & Evaluation, 2022
This article discusses visual techniques for detecting test items that would be optimal to be selected to the final compilation on the one hand and, on the other hand, to out-select those items that would lower the quality of the compilation. Some classic visual tools are discussed, first, in a practical manner in diagnosing the logical,…
Descriptors: Test Items, Item Analysis, Item Response Theory, Cutting Scores
Gilber Chura-Quispe; Cristina Beatriz Flores-Rosado; Alex Alfredo Valenzuela-Romero; Enlil Iván Herrera-Pérez; Avenilda Eufemia Herrera-Chura; Mercedes Alejandrina Collazos Alarcón – Contemporary Educational Technology, 2025
Information literacy is a fundamental component in the academic development of future professionals. The aim of the study was to evaluate the metric properties of the 'questionnaire of self-perceived information competences', analyzing the factorial structure, internal consistency, convergent validity, factorial invariance according to gender and…
Descriptors: Information Literacy, College Students, Student Attitudes, Foreign Countries
Lewis, Jennifer; Lim, Hwanggyu; Padellaro, Frank; Sireci, Stephen G.; Zenisky, April L. – Educational Measurement: Issues and Practice, 2022
Setting cut scores on (MSTs) is difficult, particularly when the test spans several grade levels, and the selection of items from MST panels must reflect the operational test specifications. In this study, we describe, illustrate, and evaluate three methods for mapping panelists' Angoff ratings into cut scores on the scale underlying an MST. The…
Descriptors: Cutting Scores, Adaptive Testing, Test Items, Item Analysis
Peter A. Edelsbrunner; Bianca A. Simonsmeier; Michael Schneider – Educational Psychology Review, 2025
Knowledge is an important predictor and outcome of learning and development. Its measurement is challenged by the fact that knowledge can be integrated and homogeneous, or fragmented and heterogeneous, which can change through learning. These characteristics of knowledge are at odds with current standards for test development, demanding a high…
Descriptors: Meta Analysis, Predictor Variables, Learning Processes, Knowledge Level
Skaggs, Gary; Hein, Serge F.; Wilkins, Jesse L. M. – Educational Measurement: Issues and Practice, 2020
In test-centered standard-setting methods, borderline performance can be represented by many different profiles of strengths and weaknesses. As a result, asking panelists to estimate item or test performance for a hypothetical group study of borderline examinees, or a typical borderline examinee, may be an extremely difficult task and one that can…
Descriptors: Standard Setting (Scoring), Cutting Scores, Testing Problems, Profiles
Yoo Jeong Jang – ProQuest LLC, 2022
Despite the increasing demand for diagnostic information, observed subscores have been often reported to lack adequate psychometric qualities such as reliability, distinctiveness, and validity. Therefore, several statistical techniques based on CTT and IRT frameworks have been proposed to improve the quality of subscores. More recently, DCM has…
Descriptors: Classification, Accuracy, Item Response Theory, Correlation
Wolkowitz, Amanda A.; Foley, Brett; Zurn, Jared – Practical Assessment, Research & Evaluation, 2023
The purpose of this study is to introduce a method for converting scored 4-option multiple-choice (MC) items into scored 3-option MC items without re-pretesting the 3-option MC items. This study describes a six-step process for achieving this goal. Data from a professional credentialing exam was used in this study and the method was applied to 24…
Descriptors: Multiple Choice Tests, Test Items, Accuracy, Test Format
Lewis, Daniel; Cook, Robert – Educational Measurement: Issues and Practice, 2020
In this paper we assert that the practice of principled assessment design renders traditional standard-setting methodology redundant at best and contradictory at worst. We describe the rationale for, and methodological details of, Embedded Standard Setting (ESS; previously, Engineered Cut Scores. Lewis, 2016), an approach to establish performance…
Descriptors: Standard Setting, Evaluation, Cutting Scores, Performance Based Assessment
Furter, Robert T.; Dwyer, Andrew C. – Applied Measurement in Education, 2020
Maintaining equivalent performance standards across forms is a psychometric challenge exacerbated by small samples. In this study, the accuracy of two equating methods (Rasch anchored calibration and nominal weights mean) and four anchor item selection methods were investigated in the context of very small samples (N = 10). Overall, nominal…
Descriptors: Classification, Accuracy, Item Response Theory, Equated Scores
Rümeysa Kaya; Bayram Çetin – International Journal of Assessment Tools in Education, 2025
In this study, the cut-off scores obtained from the Angoff, Angoff Y/N, Nedelsky and Ebel standard methods were compared with the 50 T score and the current cut-off score in various aspects. Data were collected from 448 students who took Module B1+ English Exit Exam IV and 14 experts. It was seen that while the Nedelsky method gave the lowest…
Descriptors: Standard Setting, Cutting Scores, Exit Examinations, Academic Achievement
Kara, Hakan; Cetin, Sevda – International Journal of Assessment Tools in Education, 2020
In this study, the efficiency of various random sampling methods to reduce the number of items rated by judges in an Angoff standard-setting study was examined and the methods were compared with each other. Firstly, the full-length test was formed by combining Placement Test 2012 and 2013 mathematics subsets. After then, simple random sampling…
Descriptors: Cutting Scores, Standard Setting (Scoring), Sampling, Error of Measurement
Parry, James R. – Online Submission, 2020
This paper presents research and provides a method to ensure that parallel assessments, that are generated from a large test-item database, maintain equitable difficulty and content coverage each time the assessment is presented. To maintain fairness and validity it is important that all instances of an assessment, that is intended to test the…
Descriptors: Culture Fair Tests, Difficulty Level, Test Items, Test Validity
Bramley, Tom – Research Matters, 2020
The aim of this study was to compare, by simulation, the accuracy of mapping a cut-score from one test to another by expert judgement (using the Angoff method) versus the accuracy with a small-sample equating method (chained linear equating). As expected, the standard-setting method resulted in more accurate equating when we assumed a higher level…
Descriptors: Cutting Scores, Standard Setting (Scoring), Equated Scores, Accuracy