Publication Date
In 2025 | 0 |
Since 2024 | 0 |
Since 2021 (last 5 years) | 0 |
Since 2016 (last 10 years) | 1 |
Since 2006 (last 20 years) | 27 |
Descriptor
Evaluation Methods | 37 |
Evaluation Research | 37 |
Test Items | 37 |
Item Response Theory | 14 |
Simulation | 11 |
Computer Assisted Testing | 9 |
Measurement Techniques | 9 |
Item Analysis | 8 |
Student Evaluation | 8 |
Test Bias | 8 |
Educational Assessment | 7 |
More ▼ |
Source
Author
van der Linden, Wim J. | 2 |
Ankenmann, Robert D. | 1 |
Arendasy, Martin | 1 |
Ban, Jae-Chun | 1 |
Beretvas, S. Natasha | 1 |
Berliner, David C. | 1 |
Burt, Gordon | 1 |
Camilli, Gregory | 1 |
Casey, Beth M. | 1 |
Chen, Deng-Jyi | 1 |
Chen, Shu-Ling | 1 |
More ▼ |
Publication Type
Journal Articles | 35 |
Reports - Evaluative | 15 |
Reports - Research | 15 |
Reports - Descriptive | 6 |
Opinion Papers | 2 |
Information Analyses | 1 |
Speeches/Meeting Papers | 1 |
Education Level
Higher Education | 9 |
Elementary Secondary Education | 7 |
Postsecondary Education | 4 |
Elementary Education | 3 |
Grade 4 | 2 |
Grade 12 | 1 |
Grade 5 | 1 |
Grade 6 | 1 |
Grade 8 | 1 |
Audience
Practitioners | 1 |
Laws, Policies, & Programs
Assessments and Surveys
ACT Assessment | 1 |
What Works Clearinghouse Rating
Spurgeon, Shawn L. – Measurement and Evaluation in Counseling and Development, 2017
Construct irrelevance (CI) and construct underrepresentation (CU) are 2 major threats to validity, yet they are rarely discussed within the counseling literature. This article provides information about the relevance of these threats to internal validity. An illustrative case example will be provided to assist counselors in understanding these…
Descriptors: Construct Validity, Evaluation Criteria, Evaluation Methods, Evaluation Problems
Ferrando, Pere J. – Psicologica: International Journal of Methodology and Experimental Psychology, 2012
Model-based attempts to rigorously study the broad and imprecise concept of "discriminating power" are scarce, and generally limited to nonlinear models for binary responses. This paper proposes a comprehensive framework for assessing the discriminating power of item and test scores which are analyzed or obtained using Spearman's…
Descriptors: Student Evaluation, Psychometrics, Test Items, Scores
Debelak, Rudolf; Arendasy, Martin – Educational and Psychological Measurement, 2012
A new approach to identify item clusters fitting the Rasch model is described and evaluated using simulated and real data. The proposed method is based on hierarchical cluster analysis and constructs clusters of items that show a good fit to the Rasch model. It thus gives an estimate of the number of independent scales satisfying the postulates of…
Descriptors: Test Items, Factor Analysis, Evaluation Methods, Simulation
Maydeu-Olivares, Alberto – Measurement: Interdisciplinary Research and Perspectives, 2013
In this rejoinder, Maydeu-Olivares states that, in item response theory (IRT) measurement applications, the application of goodness-of-fit (GOF) methods informs researchers of the discrepancy between the model and the data being fitted (the room for improvement). By routinely reporting the GOF of IRT models, together with the substantive results…
Descriptors: Goodness of Fit, Models, Evaluation Methods, Item Response Theory
Jiao, Hong; Liu, Junhui; Haynie, Kathleen; Woo, Ada; Gorham, Jerry – Educational and Psychological Measurement, 2012
This study explored the impact of partial credit scoring of one type of innovative items (multiple-response items) in a computerized adaptive version of a large-scale licensure pretest and operational test settings. The impacts of partial credit scoring on the estimation of the ability parameters and classification decisions in operational test…
Descriptors: Test Items, Computer Assisted Testing, Measures (Individuals), Scoring
Kim, Eun Sook; Yoon, Myeongsun; Lee, Taehun – Educational and Psychological Measurement, 2012
Multiple-indicators multiple-causes (MIMIC) modeling is often used to test a latent group mean difference while assuming the equivalence of factor loadings and intercepts over groups. However, this study demonstrated that MIMIC was insensitive to the presence of factor loading noninvariance, which implies that factor loading invariance should be…
Descriptors: Test Items, Simulation, Testing, Statistical Analysis
Berliner, David C. – Teacher Educator, 2013
In the United States, but not only here, the movement to evaluate teachers based on student test scores has received powerful political and parental support. The logic is simple. From one testing occasion to another students should show growth in their knowledge and skill. Similar types of students should show similar patterns of growth. Those…
Descriptors: Teacher Evaluation, Merit Pay, Evaluation Problems, Models
Stark, Stephen; Chernyshenko, Oleksandr S. – International Journal of Testing, 2011
This article delves into a relatively unexplored area of measurement by focusing on adaptive testing with unidimensional pairwise preference items. The use of such tests is becoming more common in applied non-cognitive assessment because research suggests that this format may help to reduce certain types of rater error and response sets commonly…
Descriptors: Test Length, Simulation, Adaptive Testing, Item Analysis
Swail, Watson Scott – College and University, 2011
College rankings create much talk and discussion in the higher education arena. This love/hate relationship has not necessarily resulted in better rankings, but rather, more rankings. This paper looks at some of the measures and pitfalls of the current rankings systems, and proposes areas for improvement through a better focus on teaching and…
Descriptors: Higher Education, Measurement Objectives, Measurement Techniques, Classification
Camilli, Gregory – Educational Research and Evaluation, 2013
In the attempt to identify or prevent unfair tests, both quantitative analyses and logical evaluation are often used. For the most part, fairness evaluation is a pragmatic attempt at determining whether procedural or substantive due process has been accorded to either a group of test takers or an individual. In both the individual and comparative…
Descriptors: Alternative Assessment, Test Bias, Test Content, Test Format
Usener, Claus A.; Majchrzak, Tim A.; Kuchen, Herbert – Interactive Technology and Smart Education, 2012
Purpose: To overcome the high manual effort of assessments for teaching personnel, e-assessment systems are used to assess students using information systems (IS). The purpose of this paper is to propose an extension of EASy, a system for e-assessment of exercises that require higher-order cognitive skills. The latest module allows assessing…
Descriptors: Foreign Countries, Computer Software, Computer Software Evaluation, Computer Assisted Testing
Stuive, Ilse; Kiers, Henk A. L.; Timmerman, Marieke E. – Educational and Psychological Measurement, 2009
A common question in test evaluation is whether an a priori assignment of items to subtests is supported by empirical data. If the analysis results indicate the assignment of items to subtests under study is not supported by data, the assignment is often adjusted. In this study the authors compare two methods on the quality of their suggestions to…
Descriptors: Simulation, Item Response Theory, Test Items, Factor Analysis
Lee, Won-Chan; Ban, Jae-Chun – Applied Measurement in Education, 2010
Various applications of item response theory often require linking to achieve a common scale for item parameter estimates obtained from different groups. This article used a simulation to examine the relative performance of four different item response theory (IRT) linking procedures in a random groups equating design: concurrent calibration with…
Descriptors: Item Response Theory, Simulation, Comparative Analysis, Measurement Techniques
Fulcher, Keston H.; Orem, Chris D. – Research & Practice in Assessment, 2010
Higher education experts tout learning outcomes assessment as a vehicle for program improvement. To this end the authors share a rubric designed explicitly to evaluate the quality of assessment and how it leads to program improvement. The rubric contains six general assessment areas, which are further broken down into 14 elements. Embedded within…
Descriptors: Higher Education, Scoring Rubrics, Educational Quality, Program Improvement
Wang, Wen-Chung; Shih, Ching-Lin; Yang, Chih-Chien – Educational and Psychological Measurement, 2009
This study implements a scale purification procedure onto the standard MIMIC method for differential item functioning (DIF) detection and assesses its performance through a series of simulations. It is found that the MIMIC method with scale purification (denoted as M-SP) outperforms the standard MIMIC method (denoted as M-ST) in controlling…
Descriptors: Test Items, Measures (Individuals), Test Bias, Evaluation Research