ERIC - Search Results

Publication Date

In 2025	0
Since 2024	0
Since 2021 (last 5 years)	1
Since 2016 (last 10 years)	1
Since 2006 (last 20 years)	5

Descriptor

Error of Measurement	8
Evaluation Methods	8
Test Theory	8
Test Reliability	5
Item Response Theory	2
Test Format	2
Academic Achievement	1
Accountability	1
Adaptive Testing	1
Bias	1
Causal Models	1
Certification	1
Classification	1
Clinical Experience	1
Comparative Testing	1
Computer Assisted Testing	1
Context Effect	1
Cost Effectiveness	1
Data Collection	1
Data Interpretation	1
Design Requirements	1
Educational Development	1
Educational Policy	1
Educational Practices	1
Educational Quality	1
More ▼

Source

Applied Psychological…	1
ETS Research Report Series	1
Educational Research and…	1
High Ability Studies	1
International Journal of…	1
Journal of Experimental…	1

Author

Aksu, Gökhan	1
Cason, Gerald J.	1
Dorans, Neil J.	1
Eser, Mehmet Taha	1
He, Qingping	1
Mapuranga, Raymond	1
Middleton, Kyndra	1
Opposs, Dennis	1
Stewart, E. Elizabeth	1
Williams, Richard H.	1
Ziegler, Albert	1
Zimmerman, Donald W.	1
van der Linden, Wim J.	1
More ▼

Publication Type

Journal Articles	6
Reports - Research	5
Reports - Evaluative	3
Opinion Papers	2
Speeches/Meeting Papers	2

Education Level

Adult Education	1
Elementary Secondary Education	1
Higher Education	1
Postsecondary Education	1

Audience

Location

United Kingdom (England)

Laws, Policies, & Programs

Assessments and Surveys

What Works Clearinghouse Rating

Showing all 8 results Save | Export

Comparison of the Results of the Generalizability Theory with the Inter-Rater Agreement Coefficients

Peer reviewed
PDF on ERIC

Download full text

Eser, Mehmet Taha; Aksu, Gökhan – International Journal of Curriculum and Instruction, 2022

The agreement between raters is examined within the scope of the concept of "inter-rater reliability". Although there are clear definitions of the concepts of agreement between raters and reliability between raters, there is no clear information about the conditions under which agreement and reliability level methods are appropriate to…

Descriptors: Generalizability Theory, Interrater Reliability, Evaluation Methods, Test Theory

The Reliability of Results from National Tests, Public Examinations, and Vocational Qualifications in England

Peer reviewed

Direct link

He, Qingping; Opposs, Dennis – Educational Research and Evaluation, 2012

National tests, public examinations, and vocational qualifications in England are used for a variety of purposes, including the certification of individual learners in different subject areas and the accountability of individual professionals and institutions. However, there has been ongoing debate about the reliability and validity of their…

Descriptors: Qualifications, Evidence, National Competency Tests, Foreign Countries

The Paradoxical Attenuation Effect in Tests Based on Classical Test Theory: Mathematical Background and Practical Implications for the Measurement of High Abilities

Peer reviewed

Direct link

Ziegler, Albert; Ziegler, Albert – High Ability Studies, 2009

The aim of this paper is to demonstrate the dramatic consequences the application of cut-off points can have in the practice of identifying gifted individuals. The paradoxical attenuation effect describes the frequent situation in which measurements of the gifts and talents individuals possess are lower than their true values. However, in…

Descriptors: Gifted, Academic Achievement, Test Theory, Measurement

A Review of Recent Developments in Differential Item Functioning. Research Report. ETS RR-08-43

Peer reviewed
PDF on ERIC

Download full text

Mapuranga, Raymond; Dorans, Neil J.; Middleton, Kyndra – ETS Research Report Series, 2008

In many practical settings, essentially the same differential item functioning (DIF) procedures have been in use since the late 1980s. Since then, examinee populations have become more heterogeneous, and tests have included more polytomously scored items. This paper summarizes and classifies new DIF methods and procedures that have appeared since…

Descriptors: Test Bias, Educational Development, Evaluation Methods, Statistical Analysis

On the Virtues and Vices of the Standard Error of Measurement.

Peer reviewed

Williams, Richard H.; Zimmerman, Donald W. – Journal of Experimental Education, 1984

This paper provides a list of 10 salient features of the standard error of measurement, contrasting it to the reliability coefficient. It is concluded that the standard error of measurement should be regarded as a primary characteristic of a mental test. (Author/DWH)

Descriptors: Educational Testing, Error of Measurement, Evaluation Methods, Psychological Testing

Equating Scores from Adaptive to Linear Tests

Peer reviewed

Direct link

van der Linden, Wim J. – Applied Psychological Measurement, 2006

Two local methods for observed-score equating are applied to the problem of equating an adaptive test to a linear test. In an empirical study, the methods were evaluated against a method based on the test characteristic function (TCF) of the linear test and traditional equipercentile equating applied to the ability estimates on the adaptive test…

Descriptors: Adaptive Testing, Computer Assisted Testing, Test Format, Equated Scores

Methodological Issues Related to the Study of Context Effects in Multisection Tests.

Stewart, E. Elizabeth – 1981

Context effects are defined as being influences on test performance associated with the content of successively presented test items or sections. Four types of context effects are identified: (1) direct context effects (practice effects) which occur when performance on items is affected by the examinee having been exposed to similar types of…

Descriptors: Context Effect, Data Collection, Error of Measurement, Evaluation Methods

Controlling Rater Stringency Error in Clinical Performance Rating: Further Validation of a Performance Rating Theory.

Cason, Gerald J.; And Others – 1983

Prior research in a single clinical training setting has shown Cason and Cason's (1981) simplified model of their performance rating theory can improve rating reliability and validity through statistical control of rater stringency error. Here, the model was applied to clinical performance ratings of 14 cohorts (about 250 students and 200 raters)…

Descriptors: Clinical Experience, Error of Measurement, Evaluation Methods, Higher Education