Publication Date
In 2025 | 0 |
Since 2024 | 2 |
Since 2021 (last 5 years) | 8 |
Since 2016 (last 10 years) | 15 |
Since 2006 (last 20 years) | 26 |
Descriptor
Evaluation Methods | 46 |
Scoring | 46 |
Test Items | 46 |
Test Construction | 13 |
Item Response Theory | 12 |
Student Evaluation | 12 |
Foreign Countries | 10 |
Scores | 10 |
Interrater Reliability | 8 |
Testing | 8 |
Computer Assisted Testing | 7 |
More ▼ |
Source
Author
Friedman, Greg | 2 |
Kim, Dong-In | 2 |
McGinty, Dixie | 2 |
Michaels, Hillary | 2 |
Neel, John H. | 2 |
Ochieng, Charles | 2 |
Reckase, Mark D. | 2 |
Yen, Shu Jing | 2 |
Alicia A. Stoltenberg | 1 |
Bakla, Arif | 1 |
Bhaskar, R. | 1 |
More ▼ |
Publication Type
Education Level
Audience
Practitioners | 5 |
Teachers | 4 |
Administrators | 2 |
Location
Canada | 4 |
Australia | 3 |
China | 1 |
Hong Kong | 1 |
India | 1 |
Israel | 1 |
Japan | 1 |
Pennsylvania | 1 |
South Korea | 1 |
Taiwan | 1 |
United Kingdom | 1 |
More ▼ |
Laws, Policies, & Programs
No Child Left Behind Act 2001 | 1 |
Assessments and Surveys
National Assessment of… | 3 |
Program for International… | 1 |
What Works Clearinghouse Rating
Deschênes, Marie-France; Dionne, Éric; Dorion, Michelle; Grondin, Julie – Practical Assessment, Research & Evaluation, 2023
The use of the aggregate scoring method for scoring concordance tests requires the weighting of test items to be derived from the performance of a group of experts who take the test under the same conditions as the examinees. However, the average score of experts constituting the reference panel remains a critical issue in the use of these tests.…
Descriptors: Scoring, Tests, Evaluation Methods, Test Items
Joakim Wallmark; James O. Ramsay; Juan Li; Marie Wiberg – Journal of Educational and Behavioral Statistics, 2024
Item response theory (IRT) models the relationship between the possible scores on a test item against a test taker's attainment of the latent trait that the item is intended to measure. In this study, we compare two models for tests with polytomously scored items: the optimal scoring (OS) model, a nonparametric IRT model based on the principles of…
Descriptors: Item Response Theory, Test Items, Models, Scoring
Wind, Stefanie A.; Guo, Wenjing – Educational Assessment, 2021
Scoring procedures for the constructed-response (CR) items in large-scale mixed-format educational assessments often involve checks for rater agreement or rater reliability. Although these analyses are important, researchers have documented rater effects that persist despite rater training and that are not always detected in rater agreement and…
Descriptors: Scoring, Responses, Test Items, Test Format
Alicia A. Stoltenberg – ProQuest LLC, 2024
Multiple-select multiple-choice items, or multiple-choice items with more than one correct answer, are used to quickly assess content on standardized assessments. Because there are multiple keys to these item types, there are also multiple ways to score student responses to these items. The purpose of this study was to investigate how changing the…
Descriptors: Scoring, Evaluation Methods, Multiple Choice Tests, Standardized Tests
Kim, Dong-In; Julian, Marc; Hermann, Pam – Online Submission, 2022
In test equating, one critical equating property is the group invariance property which indicates that the equating function used to convert performance on each alternate form to the reporting scale should be the same for various subgroups. To mitigate the impact of disrupted learning on the item parameters during the COVID-19 pandemic, a…
Descriptors: COVID-19, Pandemics, Test Format, Equated Scores
Setiawan, Risky – European Journal of Educational Research, 2019
The purposes of this research are: 1) to compare two equalizing tests conducted with Hebara and Stocking Lord method; 2) to describe the characteristics of each equalizing test method using windows' IRTEQ program. This research employs a participatory approach as the data are collected through questionnaires based on the National Examination…
Descriptors: Equated Scores, Evaluation Methods, Evaluation Criteria, Test Items
Tomkowicz, Joanna; Kim, Dong-In; Wan, Ping – Online Submission, 2022
In this study we evaluated the stability of item parameters and student scores, using the pre-equated (pre-pandemic) parameters from Spring 2019 and post-equated (post-pandemic) parameters from Spring 2021 in two calibration and equating designs related to item parameter treatment: re-estimating all anchor parameters (Design 1) and holding the…
Descriptors: Equated Scores, Test Items, Evaluation Methods, Pandemics
Koçak, Duygu – International Electronic Journal of Elementary Education, 2020
One of the most commonly used methods for measuring higher-order thinking skills such as problem-solving or written expression is open-ended items. Three main approaches are used to evaluate responses to open-ended items: general evaluation, rating scales, and rubrics. In order to measure and improve problem-solving skills of students, firstly, an…
Descriptors: Interrater Reliability, Item Response Theory, Test Items, Rating Scales
Lynch, Sarah – Practical Assessment, Research & Evaluation, 2022
In today's digital age, tests are increasingly being delivered on computers. Many of these computer-based tests (CBTs) have been adapted from paper-based tests (PBTs). However, this change in mode of test administration has the potential to introduce construct-irrelevant variance, affecting the validity of score interpretations. Because of this,…
Descriptors: Computer Assisted Testing, Tests, Scores, Scoring
Çekiç, Ahmet; Bakla, Arif – International Online Journal of Education and Teaching, 2021
The Internet and the software stores for mobile devices come with a huge number of digital tools for any task, and those intended for digital formative assessment (DFA) have burgeoned exponentially in the last decade. These tools vary in terms of their functionality, pedagogical quality, cost, operating systems and so forth. Teachers and learners…
Descriptors: Formative Evaluation, Futures (of Society), Computer Assisted Testing, Guidance
Burfitt, Joan – Mathematics Education Research Group of Australasia, 2017
Multiple-choice items are used in large-scale assessments of mathematical achievement for secondary students in many countries. Research findings can be implemented to improve the quality of the items and hence increase the amount of information gathered about student learning from each item. One way to achieve this is to create items for which…
Descriptors: Multiple Choice Tests, Mathematics Tests, Credits, Knowledge Level
Qudah, Ahmad Hassan – Journal of Education and Practice, 2016
The research aims to reveal the specific way to evaluate learning mathematics, so that we get the "measuring tool" for the achievement of learners in mathematics that reflect their level of understanding by score (mark), which we trust it with high degree. The behavior of the learner can be measured by a professional way to build the…
Descriptors: Mathematics Instruction, Mathematics Teachers, Student Evaluation, Evaluation Methods
Michelle M. Neumann; Jason L. Anthony; Noé A. Erazo; David L. Neumann – Grantee Submission, 2019
The framework and tools used for classroom assessment can have significant impacts on teacher practices and student achievement. Getting assessment right is an important component in creating positive learning experiences and academic success. Recent government reports (e.g., United States, Australia) call for the development of systems that use…
Descriptors: Early Childhood Education, Futures (of Society), Educational Assessment, Evaluation Methods
Todd, Amber; Romine, William L. – International Journal of Science Education, 2016
Building upon a methodologically diverse research foundation, we adapted and validated the "Learning Progression-based Assessment of Modern Genetics" (LPA-MG) for college students' knowledge of the domain. Toward collecting valid learning progression-based measures in a college majors context, we redeveloped and content validated a…
Descriptors: Genetics, College Science, College Students, Student Evaluation
International Journal of Testing, 2019
These guidelines describe considerations relevant to the assessment of test takers in or across countries or regions that are linguistically or culturally diverse. The guidelines were developed by a committee of experts to help inform test developers, psychometricians, test users, and test administrators about fairness issues in support of the…
Descriptors: Test Bias, Student Diversity, Cultural Differences, Language Usage