Publication Date
| In 2026 | 3 |
| Since 2025 | 666 |
| Since 2022 (last 5 years) | 3167 |
| Since 2017 (last 10 years) | 7408 |
| Since 2007 (last 20 years) | 15046 |
Descriptor
| Test Reliability | 15036 |
| Test Validity | 10272 |
| Reliability | 9759 |
| Foreign Countries | 7141 |
| Test Construction | 4823 |
| Validity | 4191 |
| Measures (Individuals) | 3877 |
| Factor Analysis | 3825 |
| Psychometrics | 3525 |
| Interrater Reliability | 3124 |
| Correlation | 3039 |
| More ▼ | |
Source
Author
Publication Type
Education Level
Audience
| Researchers | 709 |
| Practitioners | 451 |
| Teachers | 208 |
| Administrators | 122 |
| Policymakers | 66 |
| Counselors | 42 |
| Students | 38 |
| Parents | 11 |
| Community | 7 |
| Support Staff | 6 |
| Media Staff | 5 |
| More ▼ | |
Location
| Turkey | 1327 |
| Australia | 436 |
| Canada | 379 |
| China | 368 |
| United States | 271 |
| United Kingdom | 256 |
| Indonesia | 252 |
| Taiwan | 234 |
| Netherlands | 223 |
| Spain | 216 |
| California | 214 |
| More ▼ | |
Laws, Policies, & Programs
Assessments and Surveys
What Works Clearinghouse Rating
| Meets WWC Standards without Reservations | 8 |
| Meets WWC Standards with or without Reservations | 9 |
| Does not meet standards | 6 |
Ordorika, Imanol; Lloyd, Marion – Journal of Education Policy, 2015
In just a decade, the international university rankings have become dominant measures of institutional performance for policy-makers worldwide. Bolstered by the façade of scientific neutrality, these classification systems have reinforced the hegemonic model of higher education--that of the elite, Anglo-Saxon research university--on a global…
Descriptors: Universities, Achievement Rating, Classification, Cultural Influences
Yang, Fuyi; Xu, Jianzhong – Journal of Psychoeducational Assessment, 2015
This study reports on the psychometric evaluation of the Chinese version of the Homework Management Scale (HMS). The HMS was designed to assess students' homework management strategies. Based on a randomized split of 884 high school students in China, we conducted exploratory factor analysis on Group 1 (n = 442) and confirmatory factor analysis on…
Descriptors: Foreign Countries, High School Students, Psychometrics, Measures (Individuals)
Anderson, Daniel; Irvin, Shawn; Alonzo, Julie; Tindal, Gerald A. – Educational Measurement: Issues and Practice, 2015
The alignment of test items to content standards is critical to the validity of decisions made from standards-based tests. Generally, alignment is determined based on judgments made by a panel of content experts with either ratings averaged or via a consensus reached through discussion. When the pool of items to be reviewed is large, or the…
Descriptors: Test Items, Alignment (Education), Standards, Online Systems
Severo, Milton; Gaio, A. Rita; Povo, Ana; Silva-Pereira, Fernanda; Ferreira, Maria Amélia – Anatomical Sciences Education, 2015
In theory the formula scoring methods increase the reliability of multiple-choice tests in comparison with number-right scoring. This study aimed to evaluate the impact of the formula scoring method in clinical anatomy multiple-choice examinations, and to compare it with that from the number-right scoring method, hoping to achieve an…
Descriptors: Anatomy, Multiple Choice Tests, Scoring, Decision Making
Monroe, Scott; Cai, Li – Grantee Submission, 2015
Student Growth Percentiles (SGP, Betebenner, 2009) are used to locate a student's current score in a conditional distribution based on the student's past scores. Currently, following Betebenner (2009), quantile regression is most often used operationally to estimate the SGPs. Alternatively, multidimensional item response theory (MIRT) may also be…
Descriptors: Item Response Theory, Reliability, Growth Models, Computation
Doraiswamy, Nithya; Porter, Kristen M.; Wilson, Grant; Paprzycki, Peter; Czerniak, Charlene M.; Tuttle, Nicole; Czajkowski, Kevin – Journal of School Leadership, 2016
This paper describes the development and validation of a science teacher leadership instrument modeled on the seven domains of the Teacher Leader Model (TLM) Standards (The Teacher Leadership Exploratory Consortium, 2011). Instrument development was part of National Science Foundation--funded Mathematics and Science Partnership (MSP) program that…
Descriptors: Test Construction, Test Validity, Teacher Leadership, Teacher Behavior
Menold, Natalja; Tausch, Anja – Sociological Methods & Research, 2016
Effects of rating scale forms on cross-sectional reliability and measurement equivalence were investigated. A randomized experimental design was implemented, varying category labels and number of categories. The participants were 800 students at two German universities. In contrast to previous research, reliability assessment method was used,…
Descriptors: Rating Scales, Test Reliability, Measurement, Classification
Shabani, Karim – Cogent Education, 2016
Dynamic assessment (DA) research, still in its infancy, takes its roots from Vygotsky's concept of zone of proximal development (ZPD) to account for learner's developmental process. Breaking away from a static, incomplete and, thus, unethical assessment of learner's abilities, DA came to the fore to better crystallize learner's levels of abilities…
Descriptors: Sociocultural Patterns, Psychometrics, Second Language Learning, Ethics
Stipancic, Kaila L.; Tjaden, Kris; Wilding, Gregory – Journal of Speech, Language, and Hearing Research, 2016
Purpose: This study obtained judgments of sentence intelligibility using orthographic transcription for comparison with previously reported intelligibility judgments obtained using a visual analog scale (VAS) for individuals with Parkinson's disease and multiple sclerosis and healthy controls (K. Tjaden, J. E. Sussman, & G. E. Wilding, 2014).…
Descriptors: Diseases, Neurological Impairments, Sentences, Measures (Individuals)
Derrick, Deirdre J. – TESOL Quarterly: A Journal for Teachers of English to Speakers of Other Languages and of Standard English as a Second Dialect, 2016
Second language (L2) researchers often have to develop or change the instruments they use to measure numerous constructs (Norris & Ortega, 2012). Given the prevalence of researcher-developed and -adapted data collection instruments, and given the profound effect instrumentation can have on results, thorough reporting of instrumentation is…
Descriptors: Second Language Learning, Language Research, Research Methodology, Interrater Reliability
Thaneerananon, Taveep; Triampo, Wannapong; Nokkaew, Artorn – International Journal of Instruction, 2016
Nowadays, one of the biggest challenges of education in Thailand is the development and promotion of the students' thinking skills. The main purposes of this research were to develop an analytical thinking test for 6th grade students and evaluate the students' analytical thinking. The sample was composed of 3,567 6th grade students in 2014…
Descriptors: Test Construction, Thinking Skills, Opinions, Cognitive Tests
Climie, Emma; Henley, Laura – British Journal of Special Education, 2016
School-based practitioners are often called upon to provide assessment and recommendations for struggling students. These assessments often open doors to specialised services or interventions and provide opportunities for students to build competencies in areas of need. However, these assessments often fail to highlight the abilities of these…
Descriptors: Student Evaluation, Alternative Assessment, Relevance (Education), Models
Basha, Ertan; Kaya, Mehmet – Universal Journal of Educational Research, 2016
The purpose of this study is to examine validity and reliability of the Albanian version of the Depression, Anxiety and Stress Scale (DASS), which is developed by Lovibond and Lovibond (1995). The sample of this study is consisted of 555 subjects who were living in Kosovo. The results of confirmatory factor analysis indicated 42 items loaded on…
Descriptors: Foreign Countries, Depression (Psychology), Anxiety, Stress Variables
Jin, Ying; Eason, Hershel – Journal of Educational Issues, 2016
The effects of mean ability difference (MAD) and short tests on the performance of various DIF methods have been studied extensively in previous simulation studies. Their effects, however, have not been studied under multilevel data structure. MAD was frequently observed in large-scale cross-country comparison studies where the primary sampling…
Descriptors: Test Bias, Simulation, Hierarchical Linear Modeling, Comparative Analysis
Gillem, Angela R.; Bartoli, Eleonora; Bertsch, Kristin N.; McCarthy, Maureen A.; Constant, Kerra; Marrero-Meisky, Sheila; Robbins, Steven J.; Bellamy, Scarlett – Journal of Multicultural Counseling and Development, 2016
The Multicultural Counseling and Psychotherapy Test (MCPT), a measure of multicultural counseling competence (MCC), was validated in 2 phases. In Phase 1, the authors administered 451 test items derived from multicultural guidelines in counseling and psychology to 32 multicultural experts and 30 nonexperts. In Phase 2, the authors administered the…
Descriptors: Counseling Techniques, Cultural Relevance, Counselor Qualifications, Expertise

Peer reviewed
Direct link
