Publication Date
In 2025 | 0 |
Since 2024 | 0 |
Since 2021 (last 5 years) | 3 |
Since 2016 (last 10 years) | 6 |
Since 2006 (last 20 years) | 50 |
Descriptor
Item Analysis | 55 |
Robustness (Statistics) | 55 |
Foreign Countries | 15 |
Evaluation Research | 12 |
Measurement Techniques | 12 |
Evaluation Methods | 11 |
Item Response Theory | 11 |
Test Reliability | 11 |
Test Validity | 11 |
Models | 10 |
Test Items | 10 |
More ▼ |
Source
Author
Publication Type
Journal Articles | 45 |
Reports - Research | 27 |
Reports - Evaluative | 18 |
Reports - Descriptive | 5 |
Dissertations/Theses -… | 4 |
Information Analyses | 2 |
Guides - Non-Classroom | 1 |
Opinion Papers | 1 |
Education Level
Audience
Teachers | 2 |
Administrators | 1 |
Counselors | 1 |
Policymakers | 1 |
Location
Canada | 3 |
Australia | 2 |
Belgium | 1 |
Finland (Helsinki) | 1 |
Italy | 1 |
Japan | 1 |
Kuwait | 1 |
Malaysia | 1 |
Michigan | 1 |
Netherlands | 1 |
Switzerland | 1 |
More ▼ |
Laws, Policies, & Programs
No Child Left Behind Act 2001 | 1 |
Assessments and Surveys
Conners Teacher Rating Scale | 1 |
General Educational… | 1 |
Program for International… | 1 |
Social Skills Rating System | 1 |
Strong Interest Inventory | 1 |
Teacher Rating Scale | 1 |
Work Values Inventory | 1 |
What Works Clearinghouse Rating
von Davier, Matthias; Bezirhan, Ummugul – Educational and Psychological Measurement, 2023
Viable methods for the identification of item misfit or Differential Item Functioning (DIF) are central to scale construction and sound measurement. Many approaches rely on the derivation of a limiting distribution under the assumption that a certain model fits the data perfectly. Typical DIF assumptions such as the monotonicity and population…
Descriptors: Robustness (Statistics), Test Items, Item Analysis, Goodness of Fit
Antino, Mirko; Alvarado, Jesús M.; Asún, Rodrigo A.; Bliese, Paul – Sociological Methods & Research, 2020
The need to determine the correct dimensionality of theoretical constructs and generate valid measurement instruments when underlying items are categorical has generated a significant volume of research in the social sciences. This article presents two studies contrasting different categorical exploratory techniques. The first study compares…
Descriptors: Nonparametric Statistics, Factor Analysis, Item Analysis, Robustness (Statistics)
Lu, Ru; Guo, Hongwen; Dorans, Neil J. – ETS Research Report Series, 2021
Two families of analysis methods can be used for differential item functioning (DIF) analysis. One family is DIF analysis based on observed scores, such as the Mantel-Haenszel (MH) and the standardized proportion-correct metric for DIF procedures; the other is analysis based on latent ability, in which the statistic is a measure of departure from…
Descriptors: Robustness (Statistics), Weighted Scores, Test Items, Item Analysis
Abulela, Mohammed A. A.; Rios, Joseph A. – Applied Measurement in Education, 2022
When there are no personal consequences associated with test performance for examinees, rapid guessing (RG) is a concern and can differ between subgroups. To date, the impact of differential RG on item-level measurement invariance has received minimal attention. To that end, a simulation study was conducted to examine the robustness of the…
Descriptors: Comparative Analysis, Robustness (Statistics), Nonparametric Statistics, Item Analysis
Jinjin Huang – ProQuest LLC, 2020
Measurement invariance is crucial for an effective and valid measure of a construct. Invariance holds when the latent trait varies consistently across subgroups; in other words, the mean differences among subgroups are only due to true latent ability differences. Differential item functioning (DIF) occurs when measurement invariance is violated.…
Descriptors: Robustness (Statistics), Item Response Theory, Test Items, Item Analysis
Greenhow, Martin – Teaching Mathematics and Its Applications, 2015
This article outlines some key issues for writing effective computer-aided assessment (CAA) questions in subjects with substantial mathematical or statistical content, especially the importance of control of random parameters and the encoding of wrong methods of solution (mal-rules) commonly used by students. The pros and cons of using CAA and…
Descriptors: Mathematics Instruction, Computer Assisted Testing, Educational Principles, Educational Practices
Leuty, Melanie E. – Measurement and Evaluation in Counseling and Development, 2013
Test-retest data on Super's Work Values Inventory-Revised for a group of predominantly White ("N" = 995) women (mean age = 23.5 years, SD = 8.07) and men (mean age = 21.5 years, SD = 5.80) showed stability in mean-level scores over a period of 1 year for the sample as a whole. However, low raw score and rank order stability coefficients…
Descriptors: Robustness (Statistics), Scores, Individual Differences, Item Analysis
Duncan, Greg J.; Engel, Mimi; Claessens, Amy; Dowsett, Chantelle J. – Developmental Psychology, 2014
Replications and robustness checks are key elements of the scientific method and a staple in many disciplines. However, leading journals in developmental psychology rarely include explicit replications of prior research conducted by different investigators, and few require authors to establish in their articles or online appendices that their key…
Descriptors: Replication (Evaluation), Robustness (Statistics), Developmental Psychology, Educational Research
Iacobucci, Dawn – Journal of Marketing Education, 2013
This research investigates the reliability and validity of three major publications' rankings of MBA programs. Each set of rankings showed reasonable consistency over time, both at the level of the overall rankings and for most of the facets from which the rankings are derived. Each set of rankings also showed some levels of convergent and…
Descriptors: Psychometrics, Business Administration Education, Reliability, Validity
Deane, Thomas; Nomme, Kathy; Jeffery, Erica; Pollock, Carol; Birol, Gülnur – CBE - Life Sciences Education, 2016
We followed established best practices in concept inventory design and developed a 12-item inventory to assess student ability in statistical reasoning in biology (Statistical Reasoning in Biology Concept Inventory [SRBCI]). It is important to assess student thinking in this conceptual area, because it is a fundamental requirement of being…
Descriptors: Foreign Countries, Measures (Individuals), Test Construction, Statistics
Kortemeyer, Gerd – Physical Review Special Topics - Physics Education Research, 2014
Item response theory (IRT) becomes an increasingly important tool when analyzing "big data" gathered from online educational venues. However, the mechanism was originally developed in traditional exam settings, and several of its assumptions are infringed upon when deployed in the online realm. For a large-enrollment physics course for…
Descriptors: Item Response Theory, Online Courses, Electronic Learning, Homework
Rantanen, Pekka – Assessment & Evaluation in Higher Education, 2013
A multilevel analysis approach was used to analyse students' evaluation of teaching (SET). The low value of inter-rater reliability stresses that any solid conclusions on teaching cannot be made on the basis of single feedbacks. To assess a teacher's general teaching effectiveness, one needs to evaluate four randomly chosen course implementations.…
Descriptors: Test Reliability, Feedback (Response), Generalizability Theory, Student Evaluation of Teacher Performance
Menil, Violeta C.; Ye, Ruili – MathAMATYC Educator, 2012
This study serves as a teaching aid for teachers of introductory statistics. The aim of this study was limited to determining various sample sizes when estimating population proportion. Tables on sample sizes were generated using a C[superscript ++] program, which depends on population size, degree of precision or error level, and confidence…
Descriptors: Sample Size, Probability, Statistics, Sampling
Dornheim, Liane; Ramnath, R.; Gomez, C.; von Harscher, H.; Pellegrini, A. – Online Submission, 2011
This study examined psychometric properties of the MCCI (Millon College Counseling Inventory) (T. Millon, Strack, C. Millon, & Grossman, 2006), as applied to students from ethnically and culturally diverse backgrounds. The sample (N = 209, Mean age = 23.81, 74% identified as ethnic minority) was derived from students presented for counseling…
Descriptors: Psychometrics, Item Analysis, Replication (Evaluation), Ethnic Diversity
Goldhaber, Dan; Chaplin, Duncan – Center for Education Data & Research, 2012
In a provocative and influential paper, Jesse Rothstein (2010) finds that standard value added models (VAMs) suggest implausible future teacher effects on past student achievement, a finding that obviously cannot be viewed as causal. This is the basis of a falsification test (the Rothstein falsification test) that appears to indicate bias in VAM…
Descriptors: School Effectiveness, Teacher Effectiveness, Achievement Gains, Statistical Bias