Publication Date
In 2025 | 1 |
Since 2024 | 3 |
Since 2021 (last 5 years) | 5 |
Since 2016 (last 10 years) | 9 |
Since 2006 (last 20 years) | 18 |
Descriptor
Difficulty Level | 24 |
Evaluation Methods | 24 |
Item Analysis | 24 |
Test Items | 17 |
Foreign Countries | 9 |
Test Construction | 9 |
Item Response Theory | 7 |
Multiple Choice Tests | 6 |
Psychometrics | 6 |
Test Validity | 6 |
Comparative Analysis | 5 |
More ▼ |
Source
Author
Merz, William R. | 2 |
Ahn, Soyeon | 1 |
Alicia A. Stoltenberg | 1 |
Apantee Poonputta | 1 |
Barniol, Pablo | 1 |
Barry, Carol | 1 |
Beauchamp, David | 1 |
Borowski, Andreas | 1 |
Carolyn Maxwell | 1 |
Chen, Deng-Jyi | 1 |
Chen, Hanwei | 1 |
More ▼ |
Publication Type
Education Level
Elementary Education | 5 |
Elementary Secondary Education | 3 |
Higher Education | 3 |
Postsecondary Education | 3 |
Secondary Education | 3 |
Grade 6 | 2 |
Early Childhood Education | 1 |
High Schools | 1 |
Kindergarten | 1 |
Primary Education | 1 |
Audience
Researchers | 2 |
Practitioners | 1 |
Teachers | 1 |
Location
Australia | 1 |
Germany | 1 |
Greece | 1 |
India | 1 |
Mexico | 1 |
South Africa | 1 |
Taiwan | 1 |
Thailand | 1 |
United Kingdom (England) | 1 |
United States | 1 |
Laws, Policies, & Programs
Assessments and Surveys
Peabody Picture Vocabulary… | 1 |
Program for International… | 1 |
What Works Clearinghouse Rating
Wind, Stefanie A.; Ge, Yuan – Measurement: Interdisciplinary Research and Perspectives, 2023
In selected-response assessments such as attitude surveys with Likert-type rating scales, examinees often select from rating scale categories to reflect their locations on a construct. Researchers have observed that some examinees exhibit "response styles," which are systematic patterns of responses in which examinees are more likely to…
Descriptors: Goodness of Fit, Responses, Likert Scales, Models
Stephen Humphry; Paul Montuoro; Carolyn Maxwell – Journal of Psychoeducational Assessment, 2024
This article builds upon a proiminent definition of construct validity that focuses on variation in attributes causing variation in measurement outcomes. This article synthesizes the defintion and uses Rasch measurement modeling to explicate a modified conceptualization of construct validity for assessments of developmental attributes. If…
Descriptors: Construct Validity, Measurement Techniques, Developmental Stages, Item Analysis
Thayaamol Upapong; Apantee Poonputta – Educational Process: International Journal, 2025
Background/purpose: The purposes of this research are to develop a reliable and valid assessment tool for measuring systems thinking skills in upper primary students in Thailand and to establish a normative criterion for evaluating their systems thinking abilities based on educational standards. Materials/methods: The study followed a three-phase…
Descriptors: Thinking Skills, Elementary School Students, Measures (Individuals), Foreign Countries
Alicia A. Stoltenberg – ProQuest LLC, 2024
Multiple-select multiple-choice items, or multiple-choice items with more than one correct answer, are used to quickly assess content on standardized assessments. Because there are multiple keys to these item types, there are also multiple ways to score student responses to these items. The purpose of this study was to investigate how changing the…
Descriptors: Scoring, Evaluation Methods, Multiple Choice Tests, Standardized Tests
Parry, James R. – Online Submission, 2020
This paper presents research and provides a method to ensure that parallel assessments, that are generated from a large test-item database, maintain equitable difficulty and content coverage each time the assessment is presented. To maintain fairness and validity it is important that all instances of an assessment, that is intended to test the…
Descriptors: Culture Fair Tests, Difficulty Level, Test Items, Test Validity
Park, Sung Eun; Ahn, Soyeon; Zopluoglu, Cengiz – Educational and Psychological Measurement, 2021
This study presents a new approach to synthesizing differential item functioning (DIF) effect size: First, using correlation matrices from each study, we perform a multigroup confirmatory factor analysis (MGCFA) that examines measurement invariance of a test item between two subgroups (i.e., focal and reference groups). Then we synthesize, across…
Descriptors: Item Analysis, Effect Size, Difficulty Level, Monte Carlo Methods
Achieve, Inc., 2019
Assessment is a key lever for educational improvement. Assessments can be used to monitor, signal, and influence science teaching and learning -- provided that they are of high quality, reflect the rigor and intent of academic standards, and elicit meaningful student performances. Since the release of "A Framework for K-12 Science…
Descriptors: Difficulty Level, Evaluation Criteria, Cognitive Processes, Test Items
Beauchamp, David; Constantinou, Filio – Research Matters, 2020
Assessment is a useful process as it provides various stakeholders (e.g., teachers, parents, government, employers) with information about students' competence in a particular subject area. However, for the information generated by assessment to be useful, it needs to support valid inferences. One factor that can undermine the validity of…
Descriptors: Computational Linguistics, Inferences, Validity, Language Usage
Towns, Marcy H. – Journal of Chemical Education, 2014
Chemistry faculty members are highly skilled in obtaining, analyzing, and interpreting physical measurements, but often they are less skilled in measuring student learning. This work provides guidance for chemistry faculty from the research literature on multiple-choice item development in chemistry. Areas covered include content, stem, and…
Descriptors: Multiple Choice Tests, Test Construction, Psychometrics, Test Items
Long, Caroline; Dunne, Tim; Mokoena, Gabriel – Perspectives in Education, 2014
The rationale for the introduction of standards in the United States in the late 1980s was that the quality of education would improve. Assessment instruments in the form of written tests were constructed in order to perform a monitoring function. The introduction of standards and the associated monitoring have been replicated in South Africa. It…
Descriptors: Models, Evaluation Methods, Classroom Environment, Standards
Kirschner, Sophie; Borowski, Andreas; Fischer, Hans E.; Gess-Newsome, Julie; von Aufschnaiter, Claudia – International Journal of Science Education, 2016
Teachers' professional knowledge is assumed to be a key variable for effective teaching. As teacher education has the goal to enhance professional knowledge of current and future teachers, this knowledge should be described and assessed. Nevertheless, only a limited number of studies quantitatively measures physics teachers' professional…
Descriptors: Evaluation Methods, Tests, Test Format, Science Instruction
Barniol, Pablo; Zavala, Genaro – Physical Review Special Topics - Physics Education Research, 2014
In this article we discuss the findings of our research on students' understanding of vector concepts in problems without physical context. First, we develop a complete taxonomy of the most frequent errors made by university students when learning vector concepts. This study is based on the results of several test administrations of open-ended…
Descriptors: Multiple Choice Tests, Geometric Concepts, Algebra, Psychometrics
Shah, Mira B.; Schaefer, Barbara A.; Clark, Teresa P. – International Journal of School & Educational Psychology, 2013
To effectively provide early interventions to children, identifying those who are in need of these interventions is essential. In India, several problems hinder the process of early identification, including a lack of standardized measures for assessment. This study investigates the utility of the Bracken School Readiness Assessment, Third Edition…
Descriptors: Foreign Countries, Intervention, School Readiness, Young Children
Simos, Panagiotis G.; Sideridis, Georgios D.; Protopapas, Athanassios; Mouzaki, Angeliki – Assessment for Effective Intervention, 2011
Assessment of lexical/semantic knowledge is performed with a variety of tests varying in response requirements. The present study exemplifies the application of modern statistical approaches in the adaptation and assessment of the psychometric properties of the "Peabody Picture Vocabulary Test--Revised" (PPVT-R) Greek. Confirmatory…
Descriptors: Elementary School Students, Reading Comprehension, Semantics, Educational Assessment
Chen, Hanwei; Cui, Zhongmin; Zhu, Rongchun; Gao, Xiaohong – ACT, Inc., 2010
The most critical feature of a common-item nonequivalent groups equating design is that the average score difference between the new and old groups can be accurately decomposed into a group ability difference and a form difficulty difference. Two widely used observed-score linear equating methods, the Tucker and the Levine observed-score methods,…
Descriptors: Equated Scores, Groups, Ability Grouping, Difficulty Level
Previous Page | Next Page ยป
Pages: 1 | 2