ERIC - Search Results

Publication Date

In 2025	0
Since 2024	0
Since 2021 (last 5 years)	0
Since 2016 (last 10 years)	1
Since 2006 (last 20 years)	27

Descriptor

Evaluation Methods	37
Evaluation Research	37
Test Items	37
Item Response Theory	14
Simulation	11
Computer Assisted Testing	9
Measurement Techniques	9
Item Analysis	8
Student Evaluation	8
Test Bias	8
Educational Assessment	7
Evaluation Problems	6
Factor Analysis	6
Psychometrics	6
Statistical Analysis	6
Test Construction	6
Difficulty Level	5
Foreign Countries	5
Test Content	5
Adaptive Testing	4
Comparative Analysis	4
Computer Software	4
Educational Technology	4
Educational Testing	4
Evaluation Criteria	4
More ▼

Publication Type

Journal Articles	35
Reports - Evaluative	15
Reports - Research	15
Reports - Descriptive	6
Opinion Papers	2
Information Analyses	1
Speeches/Meeting Papers	1

Education Level

Higher Education	9
Elementary Secondary Education	7
Postsecondary Education	4
Elementary Education	3
Grade 4	2
Grade 12	1
Grade 5	1
Grade 6	1
Grade 8	1

Audience

Practitioners

Location

Germany	1
Japan	1
Massachusetts	1
Netherlands	1
South Africa	1
Taiwan	1
United States	1
Virginia	1

Laws, Policies, & Programs

Assessments and Surveys

ACT Assessment

What Works Clearinghouse Rating

Showing 1 to 15 of 37 results Save | Export

Evaluating the Unintended Consequences of Assessment Practices: Construct Irrelevance and Construct Underrepresentation

Peer reviewed

Direct link

Spurgeon, Shawn L. – Measurement and Evaluation in Counseling and Development, 2017

Construct irrelevance (CI) and construct underrepresentation (CU) are 2 major threats to validity, yet they are rarely discussed within the counseling literature. This article provides information about the relevance of these threats to internal validity. An illustrative case example will be provided to assist counselors in understanding these…

Descriptors: Construct Validity, Evaluation Criteria, Evaluation Methods, Evaluation Problems

Assessing the Discriminating Power of Item and Test Scores in the Linear Factor-Analysis Model

Peer reviewed
PDF on ERIC

Download full text

Ferrando, Pere J. – Psicologica: International Journal of Methodology and Experimental Psychology, 2012

Model-based attempts to rigorously study the broad and imprecise concept of "discriminating power" are scarce, and generally limited to nonlinear models for binary responses. This paper proposes a comprehensive framework for assessing the discriminating power of item and test scores which are analyzed or obtained using Spearman's…

Descriptors: Student Evaluation, Psychometrics, Test Items, Scores

An Algorithm for Testing Unidimensionality and Clustering Items in Rasch Measurement

Peer reviewed

Direct link

Debelak, Rudolf; Arendasy, Martin – Educational and Psychological Measurement, 2012

A new approach to identify item clusters fitting the Rasch model is described and evaluated using simulated and real data. The proposed method is based on hierarchical cluster analysis and constructs clusters of items that show a good fit to the Rasch model. It thus gives an estimate of the number of independent scales satisfying the postulates of…

Descriptors: Test Items, Factor Analysis, Evaluation Methods, Simulation

Why Should We Assess the Goodness-of-Fit of IRT Models?

Peer reviewed

Direct link

Maydeu-Olivares, Alberto – Measurement: Interdisciplinary Research and Perspectives, 2013

In this rejoinder, Maydeu-Olivares states that, in item response theory (IRT) measurement applications, the application of goodness-of-fit (GOF) methods informs researchers of the discrepancy between the model and the data being fitted (the room for improvement). By routinely reporting the GOF of IRT models, together with the substantive results…

Descriptors: Goodness of Fit, Models, Evaluation Methods, Item Response Theory

Comparison between Dichotomous and Polytomous Scoring of Innovative Items in a Large-Scale Computerized Adaptive Test

Peer reviewed

Direct link

Jiao, Hong; Liu, Junhui; Haynie, Kathleen; Woo, Ada; Gorham, Jerry – Educational and Psychological Measurement, 2012

This study explored the impact of partial credit scoring of one type of innovative items (multiple-response items) in a computerized adaptive version of a large-scale licensure pretest and operational test settings. The impacts of partial credit scoring on the estimation of the ability parameters and classification decisions in operational test…

Descriptors: Test Items, Computer Assisted Testing, Measures (Individuals), Scoring

Testing Measurement Invariance Using MIMIC: Likelihood Ratio Test with a Critical Value Adjustment

Peer reviewed

Direct link

Kim, Eun Sook; Yoon, Myeongsun; Lee, Taehun – Educational and Psychological Measurement, 2012

Multiple-indicators multiple-causes (MIMIC) modeling is often used to test a latent group mean difference while assuming the equivalence of factor loadings and intercepts over groups. However, this study demonstrated that MIMIC was insensitive to the presence of factor loading noninvariance, which implies that factor loading invariance should be…

Descriptors: Test Items, Simulation, Testing, Statistical Analysis

Problems with Value-Added Evaluations of Teachers? Let Me Count the Ways!

Peer reviewed

Direct link

Berliner, David C. – Teacher Educator, 2013

In the United States, but not only here, the movement to evaluate teachers based on student test scores has received powerful political and parental support. The logic is simple. From one testing occasion to another students should show growth in their knowledge and skill. Similar types of students should show similar patterns of growth. Those…

Descriptors: Teacher Evaluation, Merit Pay, Evaluation Problems, Models

Computerized Adaptive Testing with the Zinnes and Griggs Pairwise Preference Ideal Point Model

Peer reviewed

Direct link

Stark, Stephen; Chernyshenko, Oleksandr S. – International Journal of Testing, 2011

This article delves into a relatively unexplored area of measurement by focusing on adaptive testing with unidimensional pairwise preference items. The use of such tests is becoming more common in applied non-cognitive assessment because research suggests that this format may help to reduce certain types of rater error and response sets commonly…

Descriptors: Test Length, Simulation, Adaptive Testing, Item Analysis

In Search of a Better Mousetrap: A Look at Higher Education Ranking Systems

Peer reviewed

Direct link

Swail, Watson Scott – College and University, 2011

College rankings create much talk and discussion in the higher education arena. This love/hate relationship has not necessarily resulted in better rankings, but rather, more rankings. This paper looks at some of the measures and pitfalls of the current rankings systems, and proposes areas for improvement through a better focus on teaching and…

Descriptors: Higher Education, Measurement Objectives, Measurement Techniques, Classification

Ongoing Issues in Test Fairness

Peer reviewed

Direct link

Camilli, Gregory – Educational Research and Evaluation, 2013

In the attempt to identify or prevent unfair tests, both quantitative analyses and logical evaluation are often used. For the most part, fairness evaluation is a pragmatic attempt at determining whether procedural or substantive due process has been accorded to either a group of test takers or an individual. In both the individual and comparative…

Descriptors: Alternative Assessment, Test Bias, Test Content, Test Format

E-Assessment and Software Testing

Peer reviewed

Direct link

Usener, Claus A.; Majchrzak, Tim A.; Kuchen, Herbert – Interactive Technology and Smart Education, 2012

Purpose: To overcome the high manual effort of assessments for teaching personnel, e-assessment systems are used to assess students using information systems (IS). The purpose of this paper is to propose an extension of EASy, a system for e-assessment of exercises that require higher-order cognitive skills. The latest module allows assessing…

Descriptors: Foreign Countries, Computer Software, Computer Software Evaluation, Computer Assisted Testing

Comparison of Methods for Adjusting Incorrect Assignments of Items to Subtests: Oblique Multiple Group Method versus Confirmatory Common Factor Method

Peer reviewed

Direct link

Stuive, Ilse; Kiers, Henk A. L.; Timmerman, Marieke E. – Educational and Psychological Measurement, 2009

A common question in test evaluation is whether an a priori assignment of items to subtests is supported by empirical data. If the analysis results indicate the assignment of items to subtests under study is not supported by data, the assignment is often adjusted. In this study the authors compare two methods on the quality of their suggestions to…

Descriptors: Simulation, Item Response Theory, Test Items, Factor Analysis

A Comparison of IRT Linking Procedures

Peer reviewed

Direct link

Lee, Won-Chan; Ban, Jae-Chun – Applied Measurement in Education, 2010

Various applications of item response theory often require linking to achieve a common scale for item parameter estimates obtained from different groups. This article used a simulation to examine the relative performance of four different item response theory (IRT) linking procedures in a random groups equating design: concurrent calibration with…

Descriptors: Item Response Theory, Simulation, Comparative Analysis, Measurement Techniques

Evolving from Quantity to Quality: A New Yardstick for Assessment

Peer reviewed
PDF on ERIC

Download full text

Fulcher, Keston H.; Orem, Chris D. – Research & Practice in Assessment, 2010

Higher education experts tout learning outcomes assessment as a vehicle for program improvement. To this end the authors share a rubric designed explicitly to evaluate the quality of assessment and how it leads to program improvement. The rubric contains six general assessment areas, which are further broken down into 14 elements. Embedded within…

Descriptors: Higher Education, Scoring Rubrics, Educational Quality, Program Improvement

The MIMIC Method with Scale Purification for Detecting Differential Item Functioning

Peer reviewed

Direct link

Wang, Wen-Chung; Shih, Ching-Lin; Yang, Chih-Chien – Educational and Psychological Measurement, 2009

This study implements a scale purification procedure onto the standard MIMIC method for differential item functioning (DIF) detection and assesses its performance through a series of simulations. It is found that the MIMIC method with scale purification (denoted as M-SP) outperforms the standard MIMIC method (denoted as M-ST) in controlling…

Descriptors: Test Items, Measures (Individuals), Test Bias, Evaluation Research

Previous Page | Next Page »

Pages: 1 | 2 | 3

Educational and Psychological…	8
Applied Psychological…	2
Educational Research and…	2
Educational Technology &…	2
Journal of Educational…	2
Alberta Journal of…	1
Applied Measurement in…	1
Assessment and Evaluation in…	1
College and University	1
Computers & Education	1
ETS Research Report Series	1
Interactive Technology and…	1
International Journal of…	1
Journal of Educational…	1
Journal of Instructional…	1
Journal of Technology,…	1
Measurement and Evaluation in…	1
Measurement:…	1
National Assessment Governing…	1
Psicologica: International…	1
Quest	1
Research & Practice in…	1
Structural Equation Modeling	1
Structural Equation Modeling:…	1
Teacher Educator	1
More ▼

van der Linden, Wim J.	2
Ankenmann, Robert D.	1
Arendasy, Martin	1
Ban, Jae-Chun	1
Beretvas, S. Natasha	1
Berliner, David C.	1
Burt, Gordon	1
Camilli, Gregory	1
Casey, Beth M.	1
Chen, Deng-Jyi	1
Chen, Shu-Ling	1
Chernyshenko, Oleksandr S.	1
Cheung, Mike W. L.	1
Conejo, Ricardo	1
Cronje, Johannes C.	1
Cui, Ying	1
Debelak, Rudolf	1
Economides, Anastasios A.	1
Eignor, Daniel R.	1
Emilio	1
Ferrando, Pere J.	1
Finch, W. Holmes	1
Fraillon, Julian	1
French, Brian F.	1
Fulcher, Keston H.	1
More ▼