Publication Date
| In 2026 | 0 |
| Since 2025 | 621 |
| Since 2022 (last 5 years) | 3121 |
| Since 2017 (last 10 years) | 7362 |
| Since 2007 (last 20 years) | 15000 |
Descriptor
| Test Reliability | 15006 |
| Test Validity | 10245 |
| Reliability | 9748 |
| Foreign Countries | 7119 |
| Test Construction | 4807 |
| Validity | 4189 |
| Measures (Individuals) | 3872 |
| Factor Analysis | 3820 |
| Psychometrics | 3513 |
| Interrater Reliability | 3117 |
| Correlation | 3037 |
| More ▼ | |
Source
Author
Publication Type
Education Level
Audience
| Researchers | 709 |
| Practitioners | 451 |
| Teachers | 208 |
| Administrators | 122 |
| Policymakers | 66 |
| Counselors | 42 |
| Students | 38 |
| Parents | 11 |
| Community | 7 |
| Support Staff | 6 |
| Media Staff | 5 |
| More ▼ | |
Location
| Turkey | 1319 |
| Australia | 436 |
| Canada | 379 |
| China | 368 |
| United States | 271 |
| United Kingdom | 256 |
| Indonesia | 249 |
| Taiwan | 234 |
| Netherlands | 223 |
| Spain | 216 |
| California | 214 |
| More ▼ | |
Laws, Policies, & Programs
Assessments and Surveys
What Works Clearinghouse Rating
| Meets WWC Standards without Reservations | 8 |
| Meets WWC Standards with or without Reservations | 9 |
| Does not meet standards | 6 |
Jiyeo Yun – English Teaching, 2023
Studies on automatic scoring systems in writing assessments have also evaluated the relationship between human and machine scores for the reliability of automated essay scoring systems. This study investigated the magnitudes of indices for inter-rater agreement and discrepancy, especially regarding human and machine scoring, in writing assessment.…
Descriptors: Meta Analysis, Interrater Reliability, Essays, Scoring
Lybrya Kebreab; Sarah B. Bush; Debbie Hahs-Vaughn; Farshid Safi; Janet Andreasen; Christa Jackson – North American Chapter of the International Group for the Psychology of Mathematics Education, 2023
This investigation utilized publicly available data from the High School Longitudinal Study 2009 (HSLS:09) by the National Center for Educational Statistics (NCES) to examine any latent structures among variables which may empirically support the validity and reliability of a mathematical sense of belonging (MSB) construct. Using the nationally…
Descriptors: Factor Analysis, Longitudinal Studies, Mathematics Education, Student Attitudes
Kathryn Burke Adelsberger – ProQuest LLC, 2023
Teachers at St. Scholastica School, a pseudonym, an all-girls Catholic high school, varied in approaches to discipline creating an unequal distribution of student citations and negatively impacting student experience. An Improvement Science approach identified two potential drivers of change: inconsistency in disciplinary policy and variation in…
Descriptors: Teacher Student Relationship, Educational Environment, Discipline Policy, Teacher Role
Li, Zijia; Gooden, Caroline; Toland, Michael D. – Journal of Early Intervention, 2019
This study provides preliminary evidence for reliability and validity of the Hawaii Early Learning Profile Strands 0-3 (HELP Strands 0-3), an assessment instrument for young children. First, the degree of interobserver agreement for a sample of representative HELP items was examined; results indicated that HELP scoring was dependable and…
Descriptors: Measures (Individuals), Psychometrics, Early Childhood Education, Test Reliability
Massar, Michelle M.; McIntosh, Kent; Mercer, Sterett H. – Remedial and Special Education, 2019
Assessing fidelity of implementation of school-based interventions is a critical factor in successful implementation and sustainability. The Tiered Fidelity Inventory (TFI) was developed as a comprehensive measure of all three tiers of School-Wide Positive Behavioral Interventions and Supports (SWPBIS) and is intended to measure the extent to…
Descriptors: Fidelity, Intervention, Program Implementation, Positive Behavior Supports
Evans, Jacqueline R.; Schreiber Compo, Nadja; Carol, Rolando N.; Nichols-Lopez, Kristin; Holness, Howard; Furton, Kenneth G. – Applied Cognitive Psychology, 2019
Intoxicated witnesses are common, making it important to understand alcohol's impact on witness accuracy and suggestibility. Participants assigned to an immediate retrieval condition encoded and recalled in one of the three intoxication conditions: sober control, placebo, or intoxicated. Participants in the delayed retrieval condition were…
Descriptors: Alcohol Abuse, Memory, Reliability, Accuracy
Liu, Tour; Sun, Yicong; Li, Zhen; Xin, Tao – Measurement: Interdisciplinary Research and Perspectives, 2019
Aberrant response has an important impact on item parameter estimation, individuals' evaluation, and other statistical analysis. There are various types of aberrant response behaviors in educational and psychological tests, like sleeping, guessing, and plodding. Random response is the most common one. The purpose of this research was to clarify…
Descriptors: Test Reliability, Test Validity, Item Response Theory, Differences
Rajlic, Gordana; Kwon, Jae Yung; Roded, Keren; Hubley, Anita M. – Journal of Psychoeducational Assessment, 2019
In the current study, we present the development of the Global Self-Esteem (GSE) measure. The six-item GSE fulfills a need for a short unidimensional measure of global self-esteem conceptualized as "overall positive view of self." The construct is traditionally measured by the Rosenberg Self-Esteem Scale (RSE); however, several important…
Descriptors: Self Concept Measures, Self Esteem, Test Construction, Factor Structure
Wind, Stefanie A. – Language Testing, 2019
Differences in rater judgments that are systematically related to construct-irrelevant characteristics threaten the fairness of rater-mediated writing assessments. Accordingly, it is essential that researchers and practitioners examine the degree to which the psychometric quality of rater judgments is comparable across test-taker subgroups.…
Descriptors: Nonparametric Statistics, Interrater Reliability, Differences, Writing Tests
Nieto, Ricardo; Casabianca, Jodi M. – Journal of Educational Measurement, 2019
Many large-scale assessments are designed to yield two or more scores for an individual by administering multiple sections measuring different but related skills. Multidimensional tests, or more specifically, simple structured tests, such as these rely on multiple multiple-choice and/or constructed responses sections of items to generate multiple…
Descriptors: Tests, Scoring, Responses, Test Items
Exploring the Reliability and Validity of the Learning Styles Questionnaire (LSQ) in an Arab Setting
Yousef, D. A. – Quality Assurance in Education: An International Perspective, 2019
Purpose: This study aims to examine the reliability and validity of the learning style construct conceptualized by Honey and Mumford (1986) in educational settings in the United Arab Emirates. Design/methodology/approach: Two independent samples from the UAE were used: one comprised 1,463 undergraduate students at the UAE University, and the other…
Descriptors: Foreign Countries, Cognitive Style, Questionnaires, Test Validity
Bentley, Andrew P. K.; Petcovic, Heather L.; Cassidy, David P. – Environmental Education Research, 2019
Individuals are exposed to misleading or outright false anthropogenic climate change (ACC) information. The goals of this study are to identify ACC dissenter messages, and to develop an instrument that quantifies the extent to which individuals agree with these messages. The instrument was developed using a sequential mixed methods design. A…
Descriptors: Climate, Likert Scales, Test Validity, Test Reliability
Caretta, Martina Angela; Pérez, María Alejandra – Field Methods, 2019
Transactional validity, a common approach in participatory research, is attained when preliminary analyses of research results are discussed with research participants and their feedback is incorporated in the analysis. Member checking is one way of achieving transactional validity, which has been heralded as a stronger version of validity reached…
Descriptors: Participatory Research, Validity, Reliability, Conflict
Maxwell, Mary; Gleason, Jim – International Journal of Mathematical Education in Science and Technology, 2019
Many large universities, community colleges and some smaller four-year colleges are turning to hybrid or online instruction for remedial and entry level mathematics courses, often assessed using online exams in a proctored computer lab environment. Faculty face the task of choosing questions from a publisher's text bank with very little, if any,…
Descriptors: Item Response Theory, Test Reliability, Item Banks, Algebra
Hoekstra, R.; Vugteveen, J.; Warrens, M. J.; Kruyen, P. M. – International Journal of Social Research Methodology, 2019
Cronbach's alpha is the most frequently used measure to investigate the reliability of measurement instruments. Despite its frequent use, many warn for misinterpretations of alpha. These claims about regular misunderstandings, however, are not based on empirical data. To understand how common such beliefs are, we conducted a survey study to test…
Descriptors: Statistical Analysis, Researchers, Beliefs, Knowledge Level

Peer reviewed
Direct link
