Publication Date
In 2025 | 0 |
Since 2024 | 0 |
Since 2021 (last 5 years) | 0 |
Since 2016 (last 10 years) | 1 |
Since 2006 (last 20 years) | 14 |
Descriptor
Test Interpretation | 55 |
Test Theory | 55 |
Test Validity | 20 |
Psychometrics | 15 |
Test Construction | 15 |
Testing Problems | 14 |
Test Use | 13 |
Scores | 12 |
Foreign Countries | 11 |
Measurement Techniques | 11 |
Test Reliability | 11 |
More ▼ |
Source
Author
Hubley, Anita M. | 2 |
Santee, Phillip | 2 |
Whitehead, Bruce | 2 |
Zumbo, Bruno D. | 2 |
Austin, James T. | 1 |
Baird, Jo-Anne | 1 |
Beal, Judy | 1 |
Blair, Bernadette | 1 |
Blixt, Sonya L. | 1 |
Bracken, Bruce A. | 1 |
Briere, John | 1 |
More ▼ |
Publication Type
Education Level
Elementary Secondary Education | 6 |
Higher Education | 2 |
Postsecondary Education | 1 |
Audience
Practitioners | 4 |
Researchers | 4 |
Teachers | 1 |
Location
United Kingdom | 4 |
United States | 3 |
Australia | 2 |
United Kingdom (England) | 2 |
United Kingdom (Great Britain) | 2 |
Canada | 1 |
Taiwan | 1 |
United Kingdom (Wales) | 1 |
West Germany | 1 |
Laws, Policies, & Programs
Individuals with Disabilities… | 1 |
No Child Left Behind Act 2001 | 1 |
Assessments and Surveys
What Works Clearinghouse Rating
Gafni, Naomi – Assessment in Education: Principles, Policy & Practice, 2016
Naomi Gafni, director of Research and Development, National Institute for Testing and Evaluation, Jerusalem, Israel, has devoted a substantial part of her career to the development of admissions tests and other educational tests and to the investigation of their validity. As such she is keenly aware of the complexities involved in this process.…
Descriptors: Test Validity, Test Interpretation, Test Use, Test Construction
Sinharay, Sandip – Educational Measurement: Issues and Practice, 2014
Brennan (Brennan, R. L., 2012) noted that users of test scores often want (indeed, demand) that subscores be reported, along with total test scores, for diagnostic purposes. Haberman (Haberman, S. J., 2008) suggested a method based on classical test theory (CTT) to determine if subscores have added value over the total score. According to this…
Descriptors: Scores, Test Theory, Test Interpretation
Fan, Xitao; Sun, Shaojing – Journal of Early Adolescence, 2014
In adolescence research, the treatment of measurement reliability is often fragmented, and it is not always clear how different reliability coefficients are related. We show that generalizability theory (G-theory) is a comprehensive framework of measurement reliability, encompassing all other reliability methods (e.g., Pearson "r,"…
Descriptors: Generalizability Theory, Measurement, Reliability, Correlation
Dorans, Neil J. – Educational Measurement: Issues and Practice, 2012
Views on testing--its purpose and uses and how its data are analyzed--are related to one's perspective on test takers. Test takers can be viewed as learners, examinees, or contestants. I briefly discuss the perspective of test takers as learners. I maintain that much of psychometrics views test takers as examinees. I discuss test takers as a…
Descriptors: Testing, Test Theory, Item Response Theory, Test Reliability
Yorke, Mantz; Orr, Susan; Blair, Bernadette – Studies in Higher Education, 2014
There has long been the suspicion amongst staff in Art & Design that the ratings given to their subject disciplines in the UK's National Student Survey are adversely affected by a combination of circumstances--a "perfect storm". The "perfect storm" proposition is tested by comparing ratings for Art & Design with those…
Descriptors: Student Surveys, National Surveys, Art Education, Design
Hubley, Anita M.; Zumbo, Bruno D. – Social Indicators Research, 2011
The vast majority of measures have, at their core, a purpose of personal and social change. If test developers and users want measures to have personal and social consequences and impact, then it is critical to consider the consequences and side effects of measurement in the validation process itself. The consequential basis of test interpretation…
Descriptors: Construct Validity, Social Change, Measurement, Test Interpretation
Cresswell, Mike – Measurement: Interdisciplinary Research and Perspectives, 2010
Paul Newton (2010), with his characteristic concern about theory, has set out two different ways of thinking about the basis upon which equivalences of one sort or another are established between test score scales. His reason for doing this is a desire to establish "the defensibility of linkages lower on the continuum than concordance."…
Descriptors: Foreign Countries, Measurement Techniques, Psychometrics, Comparative Analysis
Elosua, Paula; Iliescu, Dragos – International Journal of Testing, 2012
Psychometric practice does not always converge with the advances of psychometric theory. In order to investigate this gap, the authors focus on the 10 most used psychological tests in Europe, as identified by recent surveys. The article analyzes test manuals published in 6 different European countries for these 10 most used tests. A total of 32…
Descriptors: Psychological Testing, Personality Measures, Error of Measurement, Foreign Countries
Newton, Paul E. – Measurement: Interdisciplinary Research and Perspectives, 2010
This article presents the author's rejoinder to thinking about linking from issue 8(1). Particularly within the more embracing linking frameworks, e.g., Holland & Dorans (2006) and Holland (2007), there appears to be a major disjunction between (1) classification discourse: the supposed basis for classification, that is, the underlying theory…
Descriptors: Foreign Countries, Measurement Techniques, Psychometrics, Comparative Analysis
Chen, Yi-Hsin; Gorin, Joanna S.; Thompson, Marilyn S.; Tatsuoka, Kikumi K. – International Journal of Testing, 2008
As with any test administered across linguistically and culturally diverse groups, evidence suggesting the equivalence of score meaning across countries is needed for valid comparisons. The current study examines the cross-cultural equivalence of score interpretations from the Trends in International Mathematics and Science Study (TIMSS)-1999 from…
Descriptors: Construct Validity, Mathematics Tests, Foreign Countries, Equated Scores
Baird, Jo-Anne – Measurement: Interdisciplinary Research and Perspectives, 2010
Newton's article (2010) makes three main contributions to the literature. First, it is transatlantic, bringing together literatures that have been dealing with similar problems, using sometimes different methods and certainly with distinctive educational, cultural perspectives. He points out that neither of these literatures has all of the…
Descriptors: Foreign Countries, Predictive Validity, Standards, Ethics
von Davier, Alina A. – Measurement: Interdisciplinary Research and Perspectives, 2010
The article "Thinking About Linking" by Newton (2010) presents a novel philosophical perspective on the way that educational assessments should be linked. Newton starts by describing the linking framework as it was characterized in various publications and identifies a cross-cultural dimension in the definitions and uses of test…
Descriptors: Foreign Countries, Educational Assessment, Student Evaluation, Evaluation Criteria

Griffiths, H. B.; McLone, R. R. – Educational Studies in Mathematics, 1984
Results obtained when a procedure for assessing the questions on uniersity mathematics examinations to see what skills were needed for their solution are given for a sample of 1400 questions set during 1976 in 10 British universities. The method is a way of focusing rational argument. (MNS)
Descriptors: College Mathematics, Higher Education, Mathematics Instruction, Test Construction

Jarjoura, David – Journal of Educational Statistics, 1985
Issues regarding tolerance and confidence intervals are discussed within the context of educational measurement, and conceptual distinctions are drawn between these two types of intervals. Points are raised about the advantages of tolerance intervals when the focus is on a particular observed score rather than a particular examinee. (Author/BW)
Descriptors: Comparative Analysis, Error of Measurement, Mathematical Models, Test Interpretation

Rost, Jurgen – Psychometrika, 1985
A latent class model for rating data is presented which provides an alternative to the latent trait approach of analyzing test data. It is the analog of Andrich's binomial Rasch model for Lazarsfeld's latent class analysis (LCA). Response probabilities for rating categories follow a binomial distribution and depend on class-specific item…
Descriptors: Item Analysis, Latent Trait Theory, Mathematical Models, Rating Scales