ERIC - Search Results

Publication Date

In 2025	0
Since 2024	1
Since 2021 (last 5 years)	1
Since 2016 (last 10 years)	1
Since 2006 (last 20 years)	31

Descriptor

Educational Testing	40
Evaluation Methods	40
Psychometrics	40
Educational Assessment	28
Measurement	17
Student Evaluation	17
Testing Problems	14
Measurement Techniques	13
Test Construction	13
Classification	11
Evaluation Problems	11
Foreign Countries	10
Item Response Theory	10
Comparative Analysis	9
Computer Assisted Testing	9
Evaluation Research	9
Test Interpretation	9
Test Validity	9
Equated Scores	7
Test Items	7
Test Use	7
Adaptive Testing	6
Criterion Referenced Tests	6
Definitions	6
Diagnostic Tests	6
More ▼

Source

Measurement:…	12
Studies in Educational…	4
Journal of Educational…	3
Assessing Writing	2
Journal of Applied Testing…	2
Research Papers in Education	2
British Journal of…	1
Contemporary Education	1
Educational Assessment	1
Educational Research Review	1
English Teaching: Practice…	1
IAP - Information Age…	1
Ministerial Council on…	1
Scholar-Practitioner Quarterly	1
More ▼

Publication Type

Journal Articles	31
Opinion Papers	16
Reports - Evaluative	9
Reports - Descriptive	8
Reports - Research	7
Speeches/Meeting Papers	3
Books	1
Collected Works - General	1
Guides - Non-Classroom	1
Information Analyses	1
Numerical/Quantitative Data	1
Reports - General	1
More ▼

Education Level

Elementary Secondary Education	18
Higher Education	4
Postsecondary Education	4
Elementary Education	3
Adult Education	1
Grade 4	1
Grade 6	1
High Schools	1
Secondary Education	1

Audience

Practitioners

Location

United Kingdom	4
United Kingdom (England)	3
United States	3
Australia	2
United Kingdom (Wales)	2
Florida	1
Germany	1
Kentucky	1
Michigan	1
Mississippi	1
New York	1
New Zealand	1
Texas	1
Virginia	1
More ▼

Laws, Policies, & Programs

No Child Left Behind Act 2001

Assessments and Surveys

Advanced Placement…	2
SAT (College Admission Test)	2
California Achievement Tests	1
Continuous Performance Test	1
Florida Comprehensive…	1
Program for International…	1

What Works Clearinghouse Rating

Showing 1 to 15 of 40 results Save | Export

Resolving and Re-Scoring Constructed Response Items in Mixed-Format Assessments: An Exploration of Three Approaches

Peer reviewed

Direct link

Stefanie A. Wind; Yangmeng Xu – Educational Assessment, 2024

We explored three approaches to resolving or re-scoring constructed-response items in mixed-format assessments: rater agreement, person fit, and targeted double scoring (TDS). We used a simulation study to consider how the three approaches impact the psychometric properties of student achievement estimates, with an emphasis on person fit. We found…

Descriptors: Interrater Reliability, Error of Measurement, Evaluation Methods, Examiners

Automated Essay Scoring: Psychometric Guidelines and Practices

Peer reviewed

Direct link

Ramineni, Chaitanya; Williamson, David M. – Assessing Writing, 2013

In this paper, we provide an overview of psychometric procedures and guidelines Educational Testing Service (ETS) uses to evaluate automated essay scoring for operational use. We briefly describe the e-rater system, the procedures and criteria used to evaluate e-rater, implications for a range of potential uses of e-rater, and directions for…

Descriptors: Educational Testing, Guidelines, Scoring, Psychometrics

Large-Scale Assessment, Locally-Developed Measures, and Automated Scoring of Essays: Fishing for Red Herrings?

Peer reviewed

Direct link

Condon, William – Assessing Writing, 2013

Automated Essay Scoring (AES) has garnered a great deal of attention from the rhetoric and composition/writing studies community since the Educational Testing Service began using e-rater[R] and the "Criterion"[R] Online Writing Evaluation Service as products in scoring writing tests, and most of the responses have been negative. While the…

Descriptors: Measurement, Psychometrics, Evaluation Methods, Educational Testing

Model-Free CUSUM Methods for Person Fit

Peer reviewed

Direct link

Armstrong, Ronald D.; Shi, Min – Journal of Educational Measurement, 2009

This article demonstrates the use of a new class of model-free cumulative sum (CUSUM) statistics to detect person fit given the responses to a linear test. The fundamental statistic being accumulated is the likelihood ratio of two probabilities. The detection performance of this CUSUM scheme is compared to other model-free person-fit statistics…

Descriptors: Probability, Simulation, Models, Psychometrics

Defending the Quality of Links between Scores from Different Tests and Exams

Peer reviewed

Direct link

Cresswell, Mike – Measurement: Interdisciplinary Research and Perspectives, 2010

Paul Newton (2010), with his characteristic concern about theory, has set out two different ways of thinking about the basis upon which equivalences of one sort or another are established between test score scales. His reason for doing this is a desire to establish "the defensibility of linkages lower on the continuum than concordance."…

Descriptors: Foreign Countries, Measurement Techniques, Psychometrics, Comparative Analysis

Evaluating the Rank-Ordering Method for Standard Maintaining

Peer reviewed

Direct link

Bramley, Tom; Gill, Tim – Research Papers in Education, 2010

The rank-ordering method for standard maintaining was designed for the purpose of mapping a known cut-score (e.g. a grade boundary mark) on one test to an equivalent point on the test score scale of another test, using holistic expert judgements about the quality of exemplars of examinees' work (scripts). It is a novel application of an old…

Descriptors: Scores, Psychometrics, Measurement Techniques, Foreign Countries

Conceptualizing Comparability

Peer reviewed

Direct link

Newton, Paul E. – Measurement: Interdisciplinary Research and Perspectives, 2010

This article presents the author's rejoinder to thinking about linking from issue 8(1). Particularly within the more embracing linking frameworks, e.g., Holland & Dorans (2006) and Holland (2007), there appears to be a major disjunction between (1) classification discourse: the supposed basis for classification, that is, the underlying theory…

Descriptors: Foreign Countries, Measurement Techniques, Psychometrics, Comparative Analysis

Linking through Improved Design, Not Redefinition: Commentary on Newton

Peer reviewed

Direct link

Walker, Michael E. – Measurement: Interdisciplinary Research and Perspectives, 2010

"Linking" is a term given to a general class of procedures by which one represents scores X on one test or measure in terms of scores Y on another test or measure. A recent taxonomy by Holland and Dorans (2006; Holland, 2007) organizes the various types of links into three broad categories: prediction, scale aligning, and equating. In…

Descriptors: Foreign Countries, Test Construction, Test Validity, Measurement Techniques

Contrasting Conceptions of Comparability

Peer reviewed

Direct link

Newton, Paul E. – Research Papers in Education, 2010

Robert Coe has claimed that three broad conceptions of comparability can be identified from the literature: performance, statistical and conventional. Each of these he rejected, in favour of a single, integrated conception which relies upon the notion of a "linking construct" and which he termed "construct comparability".…

Descriptors: Psychometrics, Measurement Techniques, Foreign Countries, Tests

A Proposed Framework of Test Administration Methods

Peer reviewed

Direct link

Thompson, Nathan A. – Journal of Applied Testing Technology, 2008

The widespread application of personal computers to educational and psychological testing has substantially increased the number of test administration methodologies available to testing programs. Many of these mediums are referred to by their acronyms, such as CAT, CBT, CCT, and LOFT. The similarities between the acronyms and the methods…

Descriptors: Testing Programs, Psychological Testing, Classification, Educational Testing

Automatic Item Generation of Probability Word Problems

Peer reviewed

Direct link

Holling, Heinz; Bertling, Jonas P.; Zeuch, Nina – Studies in Educational Evaluation, 2009

Mathematical word problems represent a common item format for assessing student competencies. Automatic item generation (AIG) is an effective way of constructing many items with predictable difficulties, based on a set of predefined task parameters. The current study presents a framework for the automatic generation of probability word problems…

Descriptors: Word Problems (Mathematics), Probability, Automation, College Students

Psychometric Aspects of Pupil Monitoring Systems

Peer reviewed

Direct link

Glas, Cees A. W.; Geerlings, Hanneke – Studies in Educational Evaluation, 2009

Pupil monitoring systems support the teacher in tailoring teaching to the individual level of a student and in comparing the progress and results of teaching with national standards. The systems are based on the availability of an item bank calibrated using item response theory. The assessment of the students' progress and results can be further…

Descriptors: Item Banks, Adaptive Testing, National Standards, Psychometrics

Defining Characteristics of Diagnostic Classification Models and the Problem of Retrofitting in Cognitive Diagnostic Assessment

Peer reviewed

Direct link

Gierl, Mark J.; Cui, Ying – Measurement: Interdisciplinary Research and Perspectives, 2008

One promising application of diagnostic classification models (DCM) is in the area of cognitive diagnostic assessment in education. However, the successful application of DCM in educational testing will likely come with a price--and this price may be in the form of new test development procedures and practices required to yield data that satisfy…

Descriptors: Educational Testing, Classification, Psychometrics, Test Construction

What Constitutes Legitimate Causal Linking?

Peer reviewed

Direct link

Baird, Jo-Anne – Measurement: Interdisciplinary Research and Perspectives, 2010

Newton's article (2010) makes three main contributions to the literature. First, it is transatlantic, bringing together literatures that have been dealing with similar problems, using sometimes different methods and certainly with distinctive educational, cultural perspectives. He points out that neither of these literatures has all of the…

Descriptors: Foreign Countries, Predictive Validity, Standards, Ethics

Multidimensional Adaptive Testing in Educational and Psychological Measurement: Current State and Future Challenges

Peer reviewed

Direct link

Frey, Andreas; Seitz, Nicki-Nils – Studies in Educational Evaluation, 2009

The paper gives an overview of multidimensional adaptive testing (MAT) and evaluates its applicability in educational and psychological testing. The approach of Segall (1996) is described as a general framework for MAT. The main advantage of MAT is its capability to increase measurement efficiency. In simulation studies conceptualizing situations…

Descriptors: Psychological Testing, Adaptive Testing, Simulation, Evaluation Methods

Previous Page | Next Page »

Pages: 1 | 2 | 3

Bielinski, John	2
Cui, Ying	2
Frey, Andreas	2
Minnema, Jane	2
Newton, Paul E.	2
Thurlow, Martha	2
Armstrong, Ronald D.	1
Baird, Jo-Anne	1
Baldwin, Su G.	1
Baumert, Jurgen	1
Bechger, Timo	1
Bertling, Jonas P.	1
Bramley, Tom	1
Brunner, Martin	1
Carstensen, Claus H.	1
Clauser, Brian E.	1
Condon, William	1
Cresswell, Mike	1
Dillon, Gerard F.	1
Donovan, Jenny	1
Economides, Anastasios A.	1
Ellis, Barbara B.	1
Ferrara, Steve	1
Garrison, Mark J.	1
Geerlings, Hanneke	1
More ▼