Publication Date
| In 2026 | 0 |
| Since 2025 | 0 |
| Since 2022 (last 5 years) | 0 |
| Since 2017 (last 10 years) | 3 |
| Since 2007 (last 20 years) | 18 |
Descriptor
| Evaluation Methods | 37 |
| Statistical Significance | 37 |
| Hypothesis Testing | 30 |
| Statistical Analysis | 11 |
| Research Methodology | 9 |
| Comparative Analysis | 8 |
| Program Evaluation | 8 |
| Probability | 7 |
| Effect Size | 6 |
| Program Effectiveness | 6 |
| Statistical Inference | 6 |
| More ▼ | |
Source
Author
Publication Type
Education Level
| Higher Education | 4 |
| Postsecondary Education | 4 |
| Early Childhood Education | 2 |
| Elementary Education | 2 |
| Elementary Secondary Education | 2 |
| Grade 7 | 2 |
| Grade 8 | 2 |
| Grade 10 | 1 |
| Grade 2 | 1 |
| Grade 4 | 1 |
| Grade 5 | 1 |
| More ▼ | |
Audience
Location
| Kenya | 1 |
| Lithuania | 1 |
| Nigeria | 1 |
| Saudi Arabia | 1 |
| United Kingdom | 1 |
Laws, Policies, & Programs
| Elementary and Secondary… | 1 |
Assessments and Surveys
| Social Skills Rating System | 1 |
| United States Medical… | 1 |
What Works Clearinghouse Rating
Phillips, Gary W. – Applied Measurement in Education, 2015
This article proposes that sampling design effects have potentially huge unrecognized impacts on the results reported by large-scale district and state assessments in the United States. When design effects are unrecognized and unaccounted for they lead to underestimating the sampling error in item and test statistics. Underestimating the sampling…
Descriptors: State Programs, Sampling, Research Design, Error of Measurement
Rajagopal, Prabha; Ravana, Sri Devi – Information Research: An International Electronic Journal, 2017
Introduction: The use of averaged topic-level scores can result in the loss of valuable data and can cause misinterpretation of the effectiveness of system performance. This study aims to use the scores of each document to evaluate document retrieval systems in a pairwise system evaluation. Method: The chosen evaluation metrics are document-level…
Descriptors: Information Retrieval, Documentation, Scores, Information Systems
Raykov, Tenko; Marcoulides, George A.; Millsap, Roger E. – Educational and Psychological Measurement, 2013
A multiple testing method for examining factorial invariance for latent constructs evaluated by multiple indicators in distinct populations is outlined. The procedure is based on the false discovery rate concept and multiple individual restriction tests and resolves general limitations of a popular factorial invariance testing approach. The…
Descriptors: Testing, Statistical Analysis, Factor Analysis, Statistical Significance
Alzaid, Jawaher Mohammed – International Education Studies, 2017
This study aims at finding out the effect of peer assessment on the evaluation process of students. The hypothesis underlying this study is that assessment is an integral part of the learning process, which should play an important role in the educational model. The current study will emphasize the importance of using peer assessment as a tool to…
Descriptors: Foreign Countries, College Students, Peer Evaluation, Student Evaluation
Leger, Lawrence A.; Glass, Karligash; Katsiampa, Paraskevi; Liu, Shibo; Sirichand, Kavita – Assessment & Evaluation in Higher Education, 2017
We evaluate feedback methods for oral presentations used in training non-quantitative research skills (literature review and various associated tasks). Training is provided through a credit-bearing module taught to MSc students of banking, economics and finance in the UK. Monitoring oral presentations and providing "best practice"…
Descriptors: Foreign Countries, Graduate Students, Masters Programs, Feedback (Response)
Akelaitis, Arturas V.; Malinauskas, Romualdas K. – European Journal of Contemporary Education, 2016
Research aim was to reveal peculiarities of the education of social skills among senior high school age students in physical education classes. We hypothesized that after the end of the educational experiment the senior high school age students will have more developed social skills in physical education classes. Participants in the study were 51…
Descriptors: Foreign Countries, Interpersonal Competence, High School Seniors, Physical Education
Newman, Denis; Jaciw, Andrew P. – Empirical Education Inc., 2012
The motivation for this paper is the authors' recent work on several randomized control trials in which they found the primary result, which averaged across subgroups or sites, to be moderated by demographic or site characteristics. They are led to examine a distinction that the Institute of Education Sciences (IES) makes between "confirmatory"…
Descriptors: Educational Research, Research Methodology, Research Design, Classification
Piper, Benjamin; Zuilkowski, Stephanie Simmons – International Review of Education, 2015
In recent years, the Education for All movement has focused more intensely on the quality of education, rather than simply provision. Many recent and current education quality interventions focus on literacy, which is the core skill required for further academic success. Despite this focus on the quality of literacy instruction in developing…
Descriptors: Foreign Countries, Reading Fluency, Reading Tests, Oral Reading
Keselman, H. J.; Miller, Charles W.; Holland, Burt – Psychological Methods, 2011
There have been many discussions of how Type I errors should be controlled when many hypotheses are tested (e.g., all possible comparisons of means, correlations, proportions, the coefficients in hierarchical models, etc.). By and large, researchers have adopted familywise (FWER) control, though this practice certainly is not universal. Familywise…
Descriptors: Validity, Statistical Significance, Probability, Computation
Breton, Theodore R. – Economics of Education Review, 2011
This paper challenges Hanushek and Woessmann's (2008) contention that the quality and not the quantity of schooling determines a nation's rate of economic growth. I first show that their statistical analysis is flawed. I then show that when a nation's average test scores and average schooling attainment are included in a national income model,…
Descriptors: Economic Progress, Income, Statistical Significance, Educational Quality
Maraun, Michael; Gabriel, Stephanie – Psychological Methods, 2010
In his article, "An Alternative to Null-Hypothesis Significance Tests," Killeen (2005) urged the discipline to abandon the practice of "p[subscript obs]"-based null hypothesis testing and to quantify the signal-to-noise characteristics of experimental outcomes with replication probabilities. He described the coefficient that he…
Descriptors: Hypothesis Testing, Statistical Inference, Probability, Statistical Significance
Cromley, Jennifer G.; Perez, Tony C.; Fitzhugh, Shannon L.; Newcombe, Nora S.; Wills, Theodore W.; Tanaka, Jacqueline C. – Journal of Experimental Education, 2013
The authors tested whether students can be taught to better understand conventional representations in diagrams, photographs, and other visual representations in science textbooks. The authors developed a teacher-delivered, workbook-and-discussion-based classroom instructional method called Conventions of Diagrams (COD). The authors trained 1…
Descriptors: Visual Aids, Textbooks, Biology, Grade 10
Serlin, Ronald C. – Psychological Methods, 2010
The sense that replicability is an important aspect of empirical science led Killeen (2005a) to define "p[subscript rep]," the probability that a replication will result in an outcome in the same direction as that found in a current experiment. Since then, several authors have praised and criticized 'p[subscript rep]," culminating…
Descriptors: Epistemology, Effect Size, Replication (Evaluation), Measurement Techniques
Cumming, Geoff – Psychological Methods, 2010
This comment offers three descriptions of "p[subscript rep]" that start with a frequentist account of confidence intervals, draw on R. A. Fisher's fiducial argument, and do not make Bayesian assumptions. Links are described among "p[subscript rep]," "p" values, and the probability a confidence interval will capture…
Descriptors: Replication (Evaluation), Measurement Techniques, Research Methodology, Validity
Ajuonuma, Juliet O. – African Higher Education Review, 2008
This study was designed to carry out a survey of the implementation of continuous assessment (CA) in Nigerian universities. Two research questions and one hypothesis were formulated to guide the study. The sample for the study consisted of 1,340 respondents. A 24 item self-report instrument was used for the study. The data generated, were analyzed…
Descriptors: Foreign Countries, Program Implementation, Testing Programs, Test Items

Peer reviewed
Direct link
