Publication Date
| In 2026 | 3 |
| Since 2025 | 656 |
| Since 2022 (last 5 years) | 3157 |
| Since 2017 (last 10 years) | 7398 |
| Since 2007 (last 20 years) | 15036 |
Descriptor
| Test Reliability | 15028 |
| Test Validity | 10265 |
| Reliability | 9757 |
| Foreign Countries | 7137 |
| Test Construction | 4821 |
| Validity | 4191 |
| Measures (Individuals) | 3876 |
| Factor Analysis | 3822 |
| Psychometrics | 3520 |
| Interrater Reliability | 3124 |
| Correlation | 3039 |
| More ▼ | |
Source
Author
Publication Type
Education Level
Audience
| Researchers | 709 |
| Practitioners | 451 |
| Teachers | 208 |
| Administrators | 122 |
| Policymakers | 66 |
| Counselors | 42 |
| Students | 38 |
| Parents | 11 |
| Community | 7 |
| Support Staff | 6 |
| Media Staff | 5 |
| More ▼ | |
Location
| Turkey | 1326 |
| Australia | 436 |
| Canada | 379 |
| China | 368 |
| United States | 271 |
| United Kingdom | 256 |
| Indonesia | 251 |
| Taiwan | 234 |
| Netherlands | 223 |
| Spain | 216 |
| California | 214 |
| More ▼ | |
Laws, Policies, & Programs
Assessments and Surveys
What Works Clearinghouse Rating
| Meets WWC Standards without Reservations | 8 |
| Meets WWC Standards with or without Reservations | 9 |
| Does not meet standards | 6 |
Peer reviewedTsai, Fu-Ju; Suen, Hoi K. – Educational and Psychological Measurement, 1993
Six methods of scoring multiple true-false items were compared in terms of reliabilities, difficulties, and discrimination. Results suggest that, for norm-referenced score interpretations, there is insufficient evidence to support any one of the methods as superior. For criterion-referenced score interpretations, effects of scoring method must be…
Descriptors: Comparative Analysis, Criterion Referenced Tests, Difficulty Level, Guessing (Tests)
Peer reviewedBurchard, Kenneth W.; And Others – Academic Medicine, 1995
A study measured interrater reliability among 140 United States and Canadian surgery exam raters and the influences of age, years in practice, and experience as an examiner on individual scores. Results indicate three aspects of examinee performance influenced scores: verbal style, dress, and content of answers. No rater characteristic…
Descriptors: Higher Education, Hygiene, Individual Characteristics, Interrater Reliability
Peer reviewedDouglas, Dan – Annual Review of Applied Linguistics, 1995
Reviews recent theoretical, methodological, and analytical developments in language testing, focusing on more refined models of language ability, reliability and validity, performance testing, innovative test formats, new applications of Item Response Theory and Generalizability Theory to test performance. An annotated bibliography discusses seven…
Descriptors: Annotated Bibliographies, Evaluation Methods, Language Proficiency, Language Tests
Peer reviewedYarbrough, Cornelia; And Others – Bulletin of the Council for Research in Music Education, 1994
Reports on a study of 614 experienced music teachers, non-music teachers, college-level music students, and non-music students on the effect of sequential patterns and different modes of presentation of music teaching. Finds that experienced teachers' evaluations were significantly higher than those of university students. (CFR)
Descriptors: Educational Strategies, Evaluative Thinking, Evaluators, Higher Education
Peer reviewedSkinner, Linda J.; And Others – Journal of Offender Rehabilitation, 1994
Anatomically detailed dolls have become a most popular clinical tool in the validation of child sexual abuse allegations. However, widespread use of these dolls is not supported by empirical literature. Issues associated with the standardization of the dolls, norms for their use in validation interviews, and training of doll users need to be…
Descriptors: Child Abuse, Child Advocacy, Emotional Problems, Evaluation Methods
Peer reviewedFantuzzo, John; And Others – Early Childhood Research Quarterly, 1995
A study developed and validated the Penn Interactive Peer Play Scale (PIPPS), a teacher-rating instrument of the interactive play behaviors of preschool children. Thirty-eight teachers completed the measure on 312 African American children enrolled in Head Start. Exploratory factor analysis revealed three reliable underlying dimensions: play…
Descriptors: Behavior Rating Scales, Blacks, Early Childhood Education, Interpersonal Competence
Peer reviewedNoijons, Jose – CALICO Journal, 1994
Defines computer assisted language testing (CALT), discusses the various processes involved, outlines the advantages and disadvantages, and examines psychometric aspects of computer testing. A table of factors distinguishes between test content and the mechanics of test taking. These factors constitute a table for developing a CALT checklist. (24…
Descriptors: Check Lists, Computer Assisted Testing, Factor Analysis, Feedback
Peer reviewedStiles, Joan – Monographs of the Society for Research in Child Development, 1994
Considers the bases of criticism of parent report as an index of their children's behavioral development and ways in which problems associated with parent report were addressed in the construction of the MacArthur Communicative Development Inventories (CDIs). Examines the nature of responses elicited from parents as they complete the CDIs. (BC)
Descriptors: Behavior Development, Body Language, Child Behavior, Data Collection
Peer reviewedSimpson, Ronald D. – Innovative Higher Education, 1995
While student evaluations of teaching performance can provide useful feedback on faculty, particularly on dimensions of course delivery, there are serious limitations. Bias and distrust are often overlooked in interpreting student ratings. An inappropriate use is in rank-ordering faculty in a department. Student evaluation data must be integrated…
Descriptors: Comparative Analysis, Evaluation Methods, Faculty Evaluation, Higher Education
Peer reviewedShaw, Darlene L.; And Others – Academic Medicine, 1995
A study found that interviewers of medical school applicants (n=471) were influenced in their ratings of applicants' noncognitive attributes by grade point average and Medical College Admission Test scores, when available, and by gender and race in accordance with affirmative action goals. Only moderate reliability across interviewers was found.…
Descriptors: Affirmative Action, College Admission, College Applicants, Higher Education
Peer reviewedChang, Lei – Applied Psychological Measurement, 1994
Reliability and validity of 4-point and 6-point scales were assessed using a new model-based approach to fit empirical data from 165 graduate students completing an attitude measure. Results suggest that the issue of four- versus six-point scales may depend on the empirical setting. (SLD)
Descriptors: Attitude Measures, Goodness of Fit, Graduate Students, Graduate Study
Peer reviewedGrant, Carolyn D.; Nash, Michael R. – Psychological Assessment, 1995
In a counterbalanced, within subjects, repeated measures design, 130 undergraduates were administered the Computer-Assisted Hypnosis Scale (CAHS) and the Stanford Hypnotic Susceptibility Scale and were hypnotized. The CAHS was shown to be a psychometrically sound instrument for measuring hypnotic ability. (SLD)
Descriptors: Ability, Clinical Diagnosis, Computer Assisted Testing, Diagnostic Tests
Peer reviewedBeidel, Deborah C.; And Others – Psychological Assessment, 1995
A new instrument, the Social Phobia and Anxiety Inventory for Children (SPAI-C), was developed. Results from 6 studies with nearly 600 children indicate that the SPAI-C is a reliable and valid measure for childhood social anxiety and fear. It may be useful for improving clinical assessment and documenting treatment outcomes. (SLD)
Descriptors: Anxiety, Children, Clinical Diagnosis, Diagnostic Tests
Snyder, Scott; Sheehan, Robert – Diagnostique, 1992
Rasch calibration procedures were applied to item-response data for the 1,262 infants and toddlers comprising the standardization sample for the Mental Scale of the Bayley Scales of Infant Development. Analyses tend to confirm the psychometric integrity of the instrument. (Author)
Descriptors: Child Development, Cognitive Tests, Concurrent Validity, Construct Validity
Peer reviewedKunnan, Antony John – Language Testing, 1992
Three analysis procedures were used to study the dependability and validity of ESLPE, a criterion-referenced English-as-a-Second-Language placement test developed at the University of California at Los Angeles in 1989. Findings led to the suggestion that some students might have been differently placed if subtest scores were used for placement.(38…
Descriptors: Cluster Analysis, Comparative Analysis, Criterion Referenced Tests, English (Second Language)


