ERIC - Search Results

Publication Date

In 2025	0
Since 2024	1
Since 2021 (last 5 years)	1
Since 2016 (last 10 years)	2
Since 2006 (last 20 years)	12

Descriptor

Interrater Reliability	38
Test Interpretation	38
Scoring	13
Test Reliability	10
Evaluators	8
Test Validity	8
Scores	7
Foreign Countries	6
Measurement Techniques	6
Standard Setting (Scoring)	6
Test Construction	6
Evaluation Methods	5
Generalizability Theory	5
Performance Based Assessment	5
Rating Scales	5
Testing Problems	5
Cutting Scores	4
Difficulty Level	4
Educational Assessment	4
Error of Measurement	4
Higher Education	4
Correlation	3
Examiners	3
Minimum Competencies	3
Minimum Competency Testing	3
More ▼

Publication Type

Journal Articles	27
Reports - Evaluative	17
Reports - Research	15
Speeches/Meeting Papers	10
Opinion Papers	4
Guides - Non-Classroom	2
ERIC Digests in Full Text	1
ERIC Publications	1
Information Analyses	1
Reports - Descriptive	1

Education Level

Higher Education	5
Postsecondary Education	4
Early Childhood Education	2
Secondary Education	2
Elementary Education	1
Grade 3	1
Grade 5	1
Grade 6	1
Grade 7	1
Grade 8	1
Grade 9	1
Middle Schools	1
Preschool Education	1
More ▼

Audience

Researchers	3
Practitioners	2
Administrators	1
Counselors	1
Teachers	1

Location

China	1
Germany	1
Kentucky	1
South Africa	1
Sweden	1
United Kingdom	1

Laws, Policies, & Programs

Assessments and Surveys

Adult Attachment Interview	1
Bender Gestalt Test	1
Early Childhood Longitudinal…	1
Minnesota Multiphasic…	1
Program for International…	1
Self Directed Search	1
Strong Campbell Interest…	1
Trends in International…	1

What Works Clearinghouse Rating

Showing 1 to 15 of 38 results Save | Export

Raters' Scoring Process in Assessment of Interpreting: An Empirical Study Based on Eye Tracking and Retrospective Verbalisation

Peer reviewed

Direct link

Chao Han; Binghan Zheng; Mingqing Xie; Shirong Chen – Interpreter and Translator Trainer, 2024

Human raters' assessment of interpreting is a complex process. Previous researchers have mainly relied on verbal reports to examine this process. To advance our understanding, we conducted an empirical study, collecting raters' eye-movement and retrospection data in a computerised interpreting assessment in which three groups of raters (n = 35)…

Descriptors: Foreign Countries, College Students, College Graduates, Interrater Reliability

Validation of Sub-Constructs in Reading Comprehension Tests Using Teachers' Classification of Cognitive Targets

Peer reviewed

Direct link

Tengberg, Michael – Language Assessment Quarterly, 2018

Reading comprehension is often treated as a multidimensional construct. In many reading tests, items are distributed over reading process categories to represent the subskills expected to constitute comprehension. This study explores (a) the extent to which specified subskills of reading comprehension tests are conceptually conceivable to…

Descriptors: Reading Tests, Reading Comprehension, Scores, Test Results

Coming Full Circle in Standard Setting: A Commentary on Wyse

Peer reviewed

Direct link

Skaggs, Gary – Measurement: Interdisciplinary Research and Perspectives, 2013

The construct map is a particularly good way to approach instrument development, and this author states that he was delighted to read Adam Wyse's thoughts about how to use construct maps for standard setting. For a number of popular standard-setting methods, Wyse shows how typical feedback to panelists fits within a construct map framework.…

Descriptors: Standard Setting (Scoring), Maps, Test Construction, Measurement

Validating the Interpretations of PISA and TIMSS Tasks: A Rating Study

Peer reviewed

Direct link

Rindermann, Heiner; Baumeister, Antonia E. E. – International Journal of Testing, 2015

Scholastic tests regard cognitive abilities to be domain-specific competences. However, high correlations between competences indicate either high task similarity or a dependence on common factors. The present rating study examined the validity of 12 Programme for International Student Assessment (PISA) and Third or Trends in International…

Descriptors: Test Validity, Test Interpretation, Competence, Reading Tests

An Examination of Assessment Fidelity in the Administration and Interpretation of Reading Tests

Peer reviewed

Direct link

Reed, Deborah K.; Sturges, Keith M. – Remedial and Special Education, 2013

Researchers have expressed concern about "implementation" fidelity in intervention research but have not extended that concern to "assessment" fidelity, or the extent to which pre-/posttests are administered and interpreted as intended. When studying reading interventions, data gathering heavily influences the identification of…

Descriptors: Reading Tests, Fidelity, Pretests Posttests, Intervention

Examining the Reliability of a Culminating Teacher Education Assessment and Discovering Areas for Reform

Peer reviewed
PDF on ERIC

Download full text

Murley, Lisa D.; Stobaugh, Rebecca; Jukes, Pamela; Tassell, Janet – Educational Renaissance, 2014

The purpose of this article is to provide an overview of the process used to examine the inter-rater reliability of the Teacher Work Sample (TWS) Scoring Rubric involved with the senior culminating experience for teacher candidates used at a large comprehensive university. The study compared holistic and analytic scores reported by Student Teacher…

Descriptors: Teacher Education, Interrater Reliability, Scoring Rubrics, Preservice Teachers

Classroom Assessment Practices, Teacher Judgments, and Student Achievement in Mathematics: Evidence from the ECLS

Peer reviewed

Direct link

Martinez, Jose Felipe; Stecher, Brian; Borko, Hilda – Educational Assessment, 2009

In this study we use data from the Early Childhood Longitudinal Survey third- and fifth-grade samples to investigate teacher judgments of student achievement, the extent to which they offer a similar picture of student mathematics achievement compared to standardized test scores, and whether classroom assessment practices moderate the relationship…

Descriptors: Mathematics Achievement, Standardized Tests, Grade 5, Student Evaluation

Understanding and Interpreting Career Decision-Making Difficulties

Peer reviewed

Direct link

Amir, Tamar; Gati, Itamar; Kleiman, Tali – Journal of Career Assessment, 2008

This research develops and tests a procedure for interpreting individuals' responses in multiscale career assessments, using the Career Decision-Making Difficulties Questionnaire (CDDQ). In Study 1, criteria for ascertaining the credibility of responses were developed, based on the judgments of 39 career-counseling experts. In Study 2, the…

Descriptors: Career Choice, Decision Making Skills, Career Development, Questionnaires

Accuracy vs. Validity, Consistency vs. Reliability, and Fairness vs. Absence of Bias: A Call for Quality

Download full text

Lang, W. Steve; Wilkerson, Judy R. – Online Submission, 2008

The National Council for Accreditation of Teacher Education (NCATE, 2002) requires teacher education units to develop assessment systems and evaluate both the success of candidates and unit operations. Because of a stated, but misguided, fear of statistics, NCATE fails to use accepted terminology to assure the quality of institutional evaluative…

Descriptors: State Standards, Validity, Resource Materials, Reliability

Interscorer Reliability of the Minnesota Percepto-Diagnostic Test-Revised.

Peer reviewed

Vance, B.; And Others – Psychology in the Schools, 1983

Investigated the interscorer reliability between a novice and a professional psychologist for the Minnesota Percepto-Diagnostic Test-Revised (MPDT-R), using a sample of 30 individuals. Results indicated that for three of the four MPDT-R scores there was a significant positive correlation between expert and novice scoring criteria. (JAC)

Descriptors: Experimenter Characteristics, Interrater Reliability, Psychological Evaluation, Psychologists

Reliability of the Advanced Psychodiagnostic Interpretation (API) Scoring System for the Bender Gestalt.

Peer reviewed

Aucone, Ernest J.; Raphael, Alan J.; Golden, Charles J.; Espe-Pfeifer, Patricia; Seldon, Jen; Pospisil, Tanya; Dornheim, Liane; Proctor-Weber, Zoe; Calabria, Michael – Assessment, 1999

Assessed the interrater reliability of the revised Advanced Psychodiagnostic Interpretation (API) (A. Raphael and C. Golden, 1998) scoring system for the Bender Gestalt Test (L. Bender, 1938). Agreement across nine raters exceeded 90% for each of three clinical protocols, and kappa statistics indicated good interrater reliability. (SLD)

Descriptors: Diagnostic Tests, Interrater Reliability, Psychological Testing, Scoring

In the Eye of the Beholder: Reply to Wilson and Shadish (2006) and Radin, Nelson, Dobyns, and Houtkooper (2006)

Peer reviewed

Direct link

Bosch, Holger; Steinkamp, Fiona; Boller, Emil – Psychological Bulletin, 2006

H. Bosch, F. Steinkamp, and E. Boller's (see record 2006-08436-001) meta-analysis, which demonstrated (a) a small but highly significant overall effect, (b) a small-study effect, and (c) extreme heterogeneity, has provoked widely differing responses. After considering D. B. Wilson and W. R. Shadish's (see record 2006-08436-002) and D. Radin, R.…

Descriptors: Meta Analysis, Publications, Bias, Models

A Psychometric Study of the Adult Attachment Interview: Reliability and Discriminant Validity.

Peer reviewed

Bakermans-Kranenburg, Marian J; van IJzendoorn, Marinus H. – Developmental Psychology, 1993

Examined the validity of the Adult Attachment Interview (AAI) measure by interviewing 83 mothers twice over 2 months, using different interviewers on each occasion. The results indicated that the reliability of the AAI classifications was quite high over time and across interviewers. The AAI classifications were independent of nonattachment…

Descriptors: Attachment Behavior, Examiners, Interrater Reliability, Mothers

An Attentional Learning Account of the Shape Bias: Reply to Cimpian and Markman (2005) and Booth, Waxman, and Huang (2005)

Peer reviewed

Direct link

Smith, Linda B.; Samuelson, Larissa – Developmental Psychology, 2006

Recently, "Developmental Psychology" published 2 articles on the shape bias; both rejected the authors' previous proposals about the role of attentional learning in the development of a shape bias in object name learning. A. Cimpian and E. Markman (2005; see record EJ733667) did so by arguing that the shape bias does not exist but is an…

Descriptors: Developmental Psychology, Cognitive Development, Misconceptions, Attention

Influences on and Limitations of Classical Test Theory Reliability Estimates.

Download full text

Arnold, Margery E. – 1996

It is incorrect to say "the test is reliable" because reliability is a function not only of the test itself, but of many factors. The present paper explains how different factors affect classical reliability estimates such as test-retest, interrater, internal consistency, and equivalent forms coefficients. Furthermore, the limits of classical test…

Descriptors: Estimation (Mathematics), Generalizability Theory, Heuristics, Interrater Reliability

Previous Page | Next Page »

Pages: 1 | 2 | 3

Educational Measurement:…	6
Applied Measurement in…	2
Developmental Psychology	2
Assessment	1
ELT Journal	1
Educational Assessment	1
Educational Renaissance	1
Harvard Educational Review	1
International Journal of…	1
International Journal of…	1
International Journal of…	1
Interpreter and Translator…	1
Journal of Career Assessment	1
Language Assessment Quarterly	1
Measurement:…	1
Online Submission	1
Psychological Bulletin	1
Psychology in the Schools	1
Remedial and Special Education	1
Theory and Research in…	1
World Englishes	1
More ▼

Amir, Tamar	1
Arnold, Margery E.	1
Aucone, Ernest J.	1
Bakermans-Kranenburg, Marian J	1
Baumeister, Antonia E. E.	1
Binghan Zheng	1
Boller, Emil	1
Borko, Hilda	1
Bosch, Holger	1
Burton, Elizabeth	1
Calabria, Michael	1
Chao Han	1
Clariana, Roy B.	1
Congdon, Peter J.	1
Curren, Randall R.	1
Davidson, Fred	1
De Mey, H. R. A.	1
Dornheim, Liane	1
Dunbar, Stephen B.	1
Erwin, T. Dary	1
Espe-Pfeifer, Patricia	1
Fuchs, Douglas	1
Gati, Itamar	1
Geisinger, Kurt F.	1
More ▼