NotesFAQContact Us
Collection
Advanced
Search Tips
Audience
Practitioners1
Laws, Policies, & Programs
No Child Left Behind Act 20011
What Works Clearinghouse Rating
Showing 1 to 15 of 40 results Save | Export
Peer reviewed Peer reviewed
Direct linkDirect link
Stefanie A. Wind; Yangmeng Xu – Educational Assessment, 2024
We explored three approaches to resolving or re-scoring constructed-response items in mixed-format assessments: rater agreement, person fit, and targeted double scoring (TDS). We used a simulation study to consider how the three approaches impact the psychometric properties of student achievement estimates, with an emphasis on person fit. We found…
Descriptors: Interrater Reliability, Error of Measurement, Evaluation Methods, Examiners
Peer reviewed Peer reviewed
Direct linkDirect link
Ramineni, Chaitanya; Williamson, David M. – Assessing Writing, 2013
In this paper, we provide an overview of psychometric procedures and guidelines Educational Testing Service (ETS) uses to evaluate automated essay scoring for operational use. We briefly describe the e-rater system, the procedures and criteria used to evaluate e-rater, implications for a range of potential uses of e-rater, and directions for…
Descriptors: Educational Testing, Guidelines, Scoring, Psychometrics
Peer reviewed Peer reviewed
Direct linkDirect link
Condon, William – Assessing Writing, 2013
Automated Essay Scoring (AES) has garnered a great deal of attention from the rhetoric and composition/writing studies community since the Educational Testing Service began using e-rater[R] and the "Criterion"[R] Online Writing Evaluation Service as products in scoring writing tests, and most of the responses have been negative. While the…
Descriptors: Measurement, Psychometrics, Evaluation Methods, Educational Testing
Peer reviewed Peer reviewed
Direct linkDirect link
Armstrong, Ronald D.; Shi, Min – Journal of Educational Measurement, 2009
This article demonstrates the use of a new class of model-free cumulative sum (CUSUM) statistics to detect person fit given the responses to a linear test. The fundamental statistic being accumulated is the likelihood ratio of two probabilities. The detection performance of this CUSUM scheme is compared to other model-free person-fit statistics…
Descriptors: Probability, Simulation, Models, Psychometrics
Peer reviewed Peer reviewed
Direct linkDirect link
Cresswell, Mike – Measurement: Interdisciplinary Research and Perspectives, 2010
Paul Newton (2010), with his characteristic concern about theory, has set out two different ways of thinking about the basis upon which equivalences of one sort or another are established between test score scales. His reason for doing this is a desire to establish "the defensibility of linkages lower on the continuum than concordance."…
Descriptors: Foreign Countries, Measurement Techniques, Psychometrics, Comparative Analysis
Peer reviewed Peer reviewed
Direct linkDirect link
Bramley, Tom; Gill, Tim – Research Papers in Education, 2010
The rank-ordering method for standard maintaining was designed for the purpose of mapping a known cut-score (e.g. a grade boundary mark) on one test to an equivalent point on the test score scale of another test, using holistic expert judgements about the quality of exemplars of examinees' work (scripts). It is a novel application of an old…
Descriptors: Scores, Psychometrics, Measurement Techniques, Foreign Countries
Peer reviewed Peer reviewed
Direct linkDirect link
Newton, Paul E. – Measurement: Interdisciplinary Research and Perspectives, 2010
This article presents the author's rejoinder to thinking about linking from issue 8(1). Particularly within the more embracing linking frameworks, e.g., Holland & Dorans (2006) and Holland (2007), there appears to be a major disjunction between (1) classification discourse: the supposed basis for classification, that is, the underlying theory…
Descriptors: Foreign Countries, Measurement Techniques, Psychometrics, Comparative Analysis
Peer reviewed Peer reviewed
Direct linkDirect link
Walker, Michael E. – Measurement: Interdisciplinary Research and Perspectives, 2010
"Linking" is a term given to a general class of procedures by which one represents scores X on one test or measure in terms of scores Y on another test or measure. A recent taxonomy by Holland and Dorans (2006; Holland, 2007) organizes the various types of links into three broad categories: prediction, scale aligning, and equating. In…
Descriptors: Foreign Countries, Test Construction, Test Validity, Measurement Techniques
Peer reviewed Peer reviewed
Direct linkDirect link
Newton, Paul E. – Research Papers in Education, 2010
Robert Coe has claimed that three broad conceptions of comparability can be identified from the literature: performance, statistical and conventional. Each of these he rejected, in favour of a single, integrated conception which relies upon the notion of a "linking construct" and which he termed "construct comparability".…
Descriptors: Psychometrics, Measurement Techniques, Foreign Countries, Tests
Peer reviewed Peer reviewed
Direct linkDirect link
Thompson, Nathan A. – Journal of Applied Testing Technology, 2008
The widespread application of personal computers to educational and psychological testing has substantially increased the number of test administration methodologies available to testing programs. Many of these mediums are referred to by their acronyms, such as CAT, CBT, CCT, and LOFT. The similarities between the acronyms and the methods…
Descriptors: Testing Programs, Psychological Testing, Classification, Educational Testing
Peer reviewed Peer reviewed
Direct linkDirect link
Holling, Heinz; Bertling, Jonas P.; Zeuch, Nina – Studies in Educational Evaluation, 2009
Mathematical word problems represent a common item format for assessing student competencies. Automatic item generation (AIG) is an effective way of constructing many items with predictable difficulties, based on a set of predefined task parameters. The current study presents a framework for the automatic generation of probability word problems…
Descriptors: Word Problems (Mathematics), Probability, Automation, College Students
Peer reviewed Peer reviewed
Direct linkDirect link
Glas, Cees A. W.; Geerlings, Hanneke – Studies in Educational Evaluation, 2009
Pupil monitoring systems support the teacher in tailoring teaching to the individual level of a student and in comparing the progress and results of teaching with national standards. The systems are based on the availability of an item bank calibrated using item response theory. The assessment of the students' progress and results can be further…
Descriptors: Item Banks, Adaptive Testing, National Standards, Psychometrics
Peer reviewed Peer reviewed
Direct linkDirect link
Gierl, Mark J.; Cui, Ying – Measurement: Interdisciplinary Research and Perspectives, 2008
One promising application of diagnostic classification models (DCM) is in the area of cognitive diagnostic assessment in education. However, the successful application of DCM in educational testing will likely come with a price--and this price may be in the form of new test development procedures and practices required to yield data that satisfy…
Descriptors: Educational Testing, Classification, Psychometrics, Test Construction
Peer reviewed Peer reviewed
Direct linkDirect link
Baird, Jo-Anne – Measurement: Interdisciplinary Research and Perspectives, 2010
Newton's article (2010) makes three main contributions to the literature. First, it is transatlantic, bringing together literatures that have been dealing with similar problems, using sometimes different methods and certainly with distinctive educational, cultural perspectives. He points out that neither of these literatures has all of the…
Descriptors: Foreign Countries, Predictive Validity, Standards, Ethics
Peer reviewed Peer reviewed
Direct linkDirect link
Frey, Andreas; Seitz, Nicki-Nils – Studies in Educational Evaluation, 2009
The paper gives an overview of multidimensional adaptive testing (MAT) and evaluates its applicability in educational and psychological testing. The approach of Segall (1996) is described as a general framework for MAT. The main advantage of MAT is its capability to increase measurement efficiency. In simulation studies conceptualizing situations…
Descriptors: Psychological Testing, Adaptive Testing, Simulation, Evaluation Methods
Previous Page | Next Page ยป
Pages: 1  |  2  |  3