Publication Date
| In 2026 | 0 |
| Since 2025 | 8 |
| Since 2022 (last 5 years) | 36 |
| Since 2017 (last 10 years) | 115 |
| Since 2007 (last 20 years) | 378 |
Descriptor
| Test Theory | 1166 |
| Test Items | 262 |
| Test Reliability | 252 |
| Test Construction | 246 |
| Test Validity | 245 |
| Psychometrics | 183 |
| Scores | 176 |
| Item Response Theory | 168 |
| Foreign Countries | 160 |
| Item Analysis | 141 |
| Statistical Analysis | 134 |
| More ▼ | |
Source
Author
Publication Type
Education Level
Location
| United States | 17 |
| United Kingdom (England) | 15 |
| Canada | 14 |
| Australia | 13 |
| Turkey | 12 |
| Sweden | 8 |
| United Kingdom | 8 |
| Netherlands | 7 |
| Texas | 7 |
| New York | 6 |
| Taiwan | 6 |
| More ▼ | |
Laws, Policies, & Programs
| No Child Left Behind Act 2001 | 4 |
| Elementary and Secondary… | 3 |
| Individuals with Disabilities… | 3 |
Assessments and Surveys
What Works Clearinghouse Rating
Peer reviewedShermis, Mark D.; And Others – Journal of Developmental Education, 1996
Describes a study to pilot-test a new reading assessment instrument designed to function in a computerized adaptive testing (CAT) environment. Indicates that the measure showed fair internal consistency and correlated well with other tests. Discusses advantages and disadvantages of CAT systems and describes the HyperCAT testing program. (23…
Descriptors: Computer Assisted Testing, Diagnostic Tests, Higher Education, Pilot Projects
Peer reviewedRamsay, James O. – Psychometrika, 1989
An alternative to the Rasch model is introduced. It characterizes strength of response according to the ratio of ability and difficulty parameters rather than their difference. Joint estimation and marginal estimation models are applied to two test data sets. (SLD)
Descriptors: Ability, Bayesian Statistics, College Entrance Examinations, Comparative Analysis
Peer reviewedCahan, Sorel – Educational and Psychological Measurement, 1989
Statistical significance and "abnormality" have been used as criteria for the evaluation of intra-individual subtest score differences. Shortcomings of these criteria are identified, and improved estimates of the true score differences are suggested. The applicability of the abnormality criterion to these improved estimates is reviewed.…
Descriptors: Estimation (Mathematics), Evaluation Methods, Individual Differences, Mathematical Models
Peer reviewedSuzuki, Shinobu; Rancer, Andrew S. – Communication Monographs, 1994
Finds that the two-factor solution of the Argumentativeness Scale and the Verbal Aggressiveness Scale was a reasonable overall fit to samples of both U.S. and Japanese college students; orthogonality of the two constructs (argumentativeness and verbal aggressiveness) held for both samples; and the two scales had satisfactory construct validity for…
Descriptors: Communication Research, Construct Validity, Cross Cultural Studies, Evaluation Methods
Peer reviewedO'Grady, Kevin E.; Medoff, Deborah R. – Multivariate Behavioral Research, 1991
A procedure for evaluating a variety of rater reliability models is presented. A multivariate linear model is used to describe and assess a set of ratings. Parameters are represented in terms of a factor analytic model, and maximum likelihood methods test the model parameters. Illustrative examples are presented. (SLD)
Descriptors: Comparative Analysis, Correlation, Equations (Mathematics), Estimation (Mathematics)
Peer reviewedCarroll, John B. – Intelligence, 1995
It is argued that the statements and accusations made by Stephen Jay Gould about the use of factor analysis are incorrect and unjustified and that tests properly designed for the purpose can adequately measure a "general" or "g" factor of intelligence, particularly in view of the developments in testing since "The…
Descriptors: Factor Analysis, Intelligence Tests, Measurement Techniques, Nature Nurture Controversy
Peer reviewedBachman, Lyle F. – Language Testing, 2000
Reviews developments in language testing research and practice over the last 20 years, and suggests future directions in the areas of professionalizing the field and validation research. Argues that concerns for ethical conduct must be grounded in valid test use, so that professionalization and validation research are inseparable. (Author/VWL)
Descriptors: Ethics, Language Research, Language Tests, Second Language Instruction
Ozsevgec, Tuncay; Cepni, Salih – Online Submission, 2006
In order to determine students' achievement, science teachers have to develop their own assessment tools. This study attempts to find out the relationship between the teachers' assessment tools and students' cognitive development according to the teachers' teaching experiences. Six open-ended survey questions were developed and delivered to 59…
Descriptors: Foreign Countries, Correlation, Science Teachers, Evaluation Methods
Graham, James M. – Educational and Psychological Measurement, 2006
Coefficient alpha, the most commonly used estimate of internal consistency, is often considered a lower bound estimate of reliability, though the extent of its underestimation is not typically known. Many researchers are unaware that coefficient alpha is based on the essentially tau-equivalent measurement model. It is the violation of the…
Descriptors: Models, Test Theory, Reliability, Structural Equation Models
Wilson, Mark; Allen, Diane D.; Li, Jun Corser – Health Education Research, 2006
This paper compares the approach and resultant outcomes of item response models (IRMs) and classical test theory (CTT). First, it reviews basic ideas of CTT, and compares them to the ideas about using IRMs introduced in an earlier paper. It then applies a comparison scheme based on the AERA/APA/NCME "Standards for Educational and…
Descriptors: Health Education, Self Efficacy, Health Behavior, Measures (Individuals)
Boman, Peter; Curtis, David; Furlong, Michael J.; Smith, Douglas C. – Journal of Psychoeducational Assessment, 2006
The construct validity of the Australian version of the Multidimensional School Anger Inventory-Revised (MSAI-R) was examined using exploratory factor analysis (EFA), Rasch analysis, and confirmatory factor analysis (CFA) on a sample of 1,400 Australian students enrolled in Years 8 through 12. The EFA revealed a strong replication of the MSAI-R's…
Descriptors: Affective Measures, Psychological Patterns, Construct Validity, Reliability
Hayward, Pamela A. – 1995
This review critiques the use of Lev Vygotsky's concept of the zone of proximal development (ZPD) in quantitative research that focuses on the role communication plays in learning. A study that makes claims in terms of the ZPD should include a pretest, a problem-solving activity, and a posttest. Without these minimal elements, researchers are not…
Descriptors: Communication Research, Communication (Thought Transfer), Learning Processes, Pretests Posttests
Lai, Morris K.; Saka, Thomas – 1993
Two studies investigated factors affecting the scores of Hawaii students taking the verbal subtest of the Scholastic Aptitude Test (SAT). For the past several years, the mean verbal scores of Hawaii students have consistently been among the lowest 10% of all states. The first study addressed the identification of items and types of items that have…
Descriptors: Comparative Analysis, High School Seniors, High Schools, Instructional Effectiveness
Ackerman, Terry A. – 1987
One of the important underlying assumptions of all item response theory (IRT) models is that of local independence. This assumption requires that the response to an item on a test not be influenced by the response to any other items. This assumption is often taken for granted, with little or no scrutiny of the response process required to answer…
Descriptors: Computer Software, Correlation, Estimation (Mathematics), Latent Trait Theory
Goulden, Nancy Rost – 1989
Since speech communication evaluators are beginning to adapt the analytic and holistic instruments and methods used for rating written products to oral products and performance, this research review investigated: (1) what the labels "analytic" and "holistic" mean; (2) the theoretical bases of the two scoring approaches; and (3)…
Descriptors: Comparative Analysis, Higher Education, Holistic Evaluation, Rating Scales

Direct link
