NotesFAQContact Us
Collection
Advanced
Search Tips
Showing all 8 results Save | Export
Peer reviewed Peer reviewed
PDF on ERIC Download full text
Eser, Mehmet Taha; Aksu, Gökhan – International Journal of Curriculum and Instruction, 2022
The agreement between raters is examined within the scope of the concept of "inter-rater reliability". Although there are clear definitions of the concepts of agreement between raters and reliability between raters, there is no clear information about the conditions under which agreement and reliability level methods are appropriate to…
Descriptors: Generalizability Theory, Interrater Reliability, Evaluation Methods, Test Theory
Peer reviewed Peer reviewed
Direct linkDirect link
He, Qingping; Opposs, Dennis – Educational Research and Evaluation, 2012
National tests, public examinations, and vocational qualifications in England are used for a variety of purposes, including the certification of individual learners in different subject areas and the accountability of individual professionals and institutions. However, there has been ongoing debate about the reliability and validity of their…
Descriptors: Qualifications, Evidence, National Competency Tests, Foreign Countries
Peer reviewed Peer reviewed
Direct linkDirect link
Ziegler, Albert; Ziegler, Albert – High Ability Studies, 2009
The aim of this paper is to demonstrate the dramatic consequences the application of cut-off points can have in the practice of identifying gifted individuals. The paradoxical attenuation effect describes the frequent situation in which measurements of the gifts and talents individuals possess are lower than their true values. However, in…
Descriptors: Gifted, Academic Achievement, Test Theory, Measurement
Peer reviewed Peer reviewed
PDF on ERIC Download full text
Mapuranga, Raymond; Dorans, Neil J.; Middleton, Kyndra – ETS Research Report Series, 2008
In many practical settings, essentially the same differential item functioning (DIF) procedures have been in use since the late 1980s. Since then, examinee populations have become more heterogeneous, and tests have included more polytomously scored items. This paper summarizes and classifies new DIF methods and procedures that have appeared since…
Descriptors: Test Bias, Educational Development, Evaluation Methods, Statistical Analysis
Peer reviewed Peer reviewed
Williams, Richard H.; Zimmerman, Donald W. – Journal of Experimental Education, 1984
This paper provides a list of 10 salient features of the standard error of measurement, contrasting it to the reliability coefficient. It is concluded that the standard error of measurement should be regarded as a primary characteristic of a mental test. (Author/DWH)
Descriptors: Educational Testing, Error of Measurement, Evaluation Methods, Psychological Testing
Peer reviewed Peer reviewed
Direct linkDirect link
van der Linden, Wim J. – Applied Psychological Measurement, 2006
Two local methods for observed-score equating are applied to the problem of equating an adaptive test to a linear test. In an empirical study, the methods were evaluated against a method based on the test characteristic function (TCF) of the linear test and traditional equipercentile equating applied to the ability estimates on the adaptive test…
Descriptors: Adaptive Testing, Computer Assisted Testing, Test Format, Equated Scores
Stewart, E. Elizabeth – 1981
Context effects are defined as being influences on test performance associated with the content of successively presented test items or sections. Four types of context effects are identified: (1) direct context effects (practice effects) which occur when performance on items is affected by the examinee having been exposed to similar types of…
Descriptors: Context Effect, Data Collection, Error of Measurement, Evaluation Methods
Cason, Gerald J.; And Others – 1983
Prior research in a single clinical training setting has shown Cason and Cason's (1981) simplified model of their performance rating theory can improve rating reliability and validity through statistical control of rater stringency error. Here, the model was applied to clinical performance ratings of 14 cohorts (about 250 students and 200 raters)…
Descriptors: Clinical Experience, Error of Measurement, Evaluation Methods, Higher Education