Publication Date
| In 2026 | 7 |
| Since 2025 | 690 |
| Since 2022 (last 5 years) | 3191 |
| Since 2017 (last 10 years) | 7432 |
| Since 2007 (last 20 years) | 15070 |
Descriptor
| Test Reliability | 15055 |
| Test Validity | 10290 |
| Reliability | 9763 |
| Foreign Countries | 7150 |
| Test Construction | 4828 |
| Validity | 4192 |
| Measures (Individuals) | 3880 |
| Factor Analysis | 3826 |
| Psychometrics | 3532 |
| Interrater Reliability | 3126 |
| Correlation | 3040 |
| More ▼ | |
Source
Author
Publication Type
Education Level
Audience
| Researchers | 709 |
| Practitioners | 451 |
| Teachers | 208 |
| Administrators | 122 |
| Policymakers | 66 |
| Counselors | 42 |
| Students | 38 |
| Parents | 11 |
| Community | 7 |
| Support Staff | 6 |
| Media Staff | 5 |
| More ▼ | |
Location
| Turkey | 1329 |
| Australia | 436 |
| Canada | 379 |
| China | 368 |
| United States | 271 |
| United Kingdom | 256 |
| Indonesia | 253 |
| Taiwan | 234 |
| Netherlands | 224 |
| Spain | 218 |
| California | 215 |
| More ▼ | |
Laws, Policies, & Programs
Assessments and Surveys
What Works Clearinghouse Rating
| Meets WWC Standards without Reservations | 8 |
| Meets WWC Standards with or without Reservations | 9 |
| Does not meet standards | 6 |
Young, John W.; Cho, Yeonsuk; Ling, Guangming; Cline, Fred; Steinberg, Jonathan; Stone, Elizabeth – Educational Assessment, 2008
English language learners (ELLs) constitute one of the fastest growing subpopulations of students in the United States. It is important to determine whether the assessments used by states in determining students' proficiencies are valid and fair for ELLs. This study focused on several standards-based assessments in mathematics and science…
Descriptors: Testing Accommodations, State Standards, Word Lists, Construct Validity
Dorman, Jeffrey P. – Educational Studies, 2008
Students' perceptions of actual and preferred classroom environment were investigated using the "What is happening in this class? questionnaire" (WIHIC). The WIHIC assesses seven classroom environment dimensions: student cohesiveness, teacher support, involvement, task orientation, investigation, cooperation and equity. A sample of 978…
Descriptors: Student Attitudes, Classroom Environment, Secondary School Students, Questionnaires
Koretz, Daniel; And Others – 1993
The 1992-93 school year was the second year of the implementation of the Vermont assessment program. Evaluation of the 1991-92 year yielded mixed results, with some evidence that the assessment program was having a strong impact on instruction, but other indications that the reliability of the portfolio scoring in both writing and mathematics was…
Descriptors: Educational Assessment, Elementary Secondary Education, Evaluation Methods, Evaluation Utilization
Sullivan, Francis J. – 1986
A study examined how pragmatic form influences evaluation of student essays in university placement testing. Specifically, the study documented how patterns in students' use of information (assumed to be either old, inferable, or new for readers) affected the holistic scores for quality given to the essays. Subjects, 99 randomly selected entering…
Descriptors: College Freshmen, Essay Tests, Evaluation Criteria, Evaluation Methods
Schempp, Paul G. – 1986
The stability of teaching behavior was examined by observing student/teacher interaction over one academic year. One teacher was studied using a time-series analysis. He had 14 years experience and taught physical education in grades K-6 in a single school. Data were collected over one academic year using the Cheffers Adaptation of Flanders…
Descriptors: Behavior Change, Case Studies, Classroom Observation Techniques, Classroom Research
Perkins, Kyle – 1986
Based on the premise that composition skills and their evaluation are crucial to the educational process, this paper presents a tentative research program for conducting future English as a second language (ESL) composition evaluation studies. The program developed in the paper covers the following topics as areas which merit further rigorous…
Descriptors: Elementary Secondary Education, English (Second Language), Error Analysis (Language), Evaluation Criteria
Bejar, Isaac I. – 1985
The feasibility of reducing scoring costs for the Test of Spoken English (TSE) by using one rater was investigated. Currently, two raters are used. It was found that, because of the possibility of different standards used by potential raters, it does not appear feasible to use a single rater as the sole determiner of speaking proficiency under the…
Descriptors: Analysis of Covariance, Cost Effectiveness, English (Second Language), Evaluation Criteria
Olejnik, Stephen F.; Porter, Andrew C. – 1978
The statistical properties of two methods of estimating gain scores for groups in quasi-experiments are compared: (1) gains in scores standardized separately for each group; and (2) analysis of covariance with estimated true pretest scores. The fan spread hypothesis is assumed for groups but not necessarily assumed for members of the groups.…
Descriptors: Academic Achievement, Achievement Gains, Analysis of Covariance, Analysis of Variance
Peer reviewedTurner, Jean – Annual Review of Applied Linguistics, 1998
This review of research on second-language oral testing outlines the nature of early research in interview-format proficiency testing, then reports on new directions in investigation of construct validity of interview-format and other oral skills tests through examination of examinee, interviewer, and rater performance. Research on empirically…
Descriptors: Construct Validity, Educational Trends, Interrater Reliability, Interviews
Linacre, John Michael – 1995
The effects on Rasch measurement of both response underfit (noise) and overfit (mutedness or superuniformity) are described and illustrated. Misfit is identified by mean-square fit statistics. Person separation and reliability are shown to be deceptive indicators of measurement effectiveness when some items exhibit marked overfit. Theoretical…
Descriptors: Children, Goodness of Fit, Item Response Theory, Measurement Techniques
Henning, Grant – 1992
The psychometric characteristics of the Test of Written English (TWE) rating scale were explored. Rasch model scalar analysis methodology was employed with more than 4,000 scored essays across 2 elicitation prompts to gather information about the rating scale and rating process. Results suggested that the intervals between TWE scale steps were…
Descriptors: English (Second Language), Equated Scores, Essays, Interrater Reliability
Gordon, Howard R. D. – 1996
The purpose of this study was to profile the preferred productivity and learning style preferences of participants enrolled in distance education courses at Marshall University (West Virginia) (Spring of 1995). The accessible population of this study consisted of 167 distance education participants in nursing, education, and paralegal programs. A…
Descriptors: Cognitive Style, College Students, Distance Education, Higher Education
Braun, Henry I.; Wainer, Howard – 1989
A desirable goal would be to develop a methodology for scoring essays so that the final grades are less affected by when or by whom each essay was read. It seems sensible to derive such grades by somehow adjusting the ratings originally given by each reader. This essay describes a solution that relies on statistical adjustment, using the context…
Descriptors: Essay Tests, Estimation (Mathematics), Interrater Reliability, Scoring
Hutchinson, Susan R. – 1994
The work of R. MacCallum et al. (1992) was extended by examining chance modifications through a Monte Carlo simulation. The stability of post hoc model modifications was examined under varying sample size, model complexity, and severity of misspecification using 2- and 4-factor oblique confirmatory factor analysis (CFA) models with four and eight…
Descriptors: Computer Simulation, Models, Monte Carlo Methods, Reliability
Ferrell, Charlotte M. – 1992
Statistical significance is often misinterpreted to mean replicability or generalizability of results, although a statistically significant difference does not equal a reliable difference. Sample splitting procedures may be a more accurate way of estimating research result generalizability. This type of cross-validation involves randomly dividing…
Descriptors: Equations (Mathematics), Generalization, Mathematical Models, Predictive Measurement

Direct link
