Publication Date
In 2025 | 0 |
Since 2024 | 3 |
Since 2021 (last 5 years) | 9 |
Since 2016 (last 10 years) | 14 |
Since 2006 (last 20 years) | 30 |
Descriptor
Validity | 52 |
Models | 12 |
Scores | 11 |
Test Interpretation | 11 |
Item Response Theory | 8 |
Test Use | 8 |
Comparative Analysis | 6 |
Computer Assisted Testing | 6 |
Scoring | 6 |
Statistical Analysis | 6 |
Test Results | 6 |
More ▼ |
Source
Journal of Educational… | 52 |
Author
Publication Type
Journal Articles | 42 |
Reports - Research | 20 |
Reports - Evaluative | 9 |
Opinion Papers | 7 |
Reports - Descriptive | 5 |
Information Analyses | 1 |
Speeches/Meeting Papers | 1 |
Education Level
Middle Schools | 3 |
Junior High Schools | 2 |
Secondary Education | 2 |
Elementary Education | 1 |
Elementary Secondary Education | 1 |
Audience
Researchers | 1 |
Location
Laws, Policies, & Programs
Assessments and Surveys
Lexile Scale of Reading | 1 |
Teaching and Learning… | 1 |
Trends in International… | 1 |
United States Medical… | 1 |
What Works Clearinghouse Rating
Daria Gerasimova – Journal of Educational Measurement, 2024
I propose two practical advances to the argument-based approach to validity: developing a living document and incorporating preregistration. First, I present a potential structure for the living document that includes an up-to-date summary of the validity argument. As the validation process may span across multiple studies, the living document…
Descriptors: Validity, Documentation, Methods, Research Reports
Sooyong Lee; Suhwa Han; Seung W. Choi – Journal of Educational Measurement, 2024
Research has shown that multiple-indicator multiple-cause (MIMIC) models can result in inflated Type I error rates in detecting differential item functioning (DIF) when the assumption of equal latent variance is violated. This study explains how the violation of the equal variance assumption adversely impacts the detection of nonuniform DIF and…
Descriptors: Factor Analysis, Bayesian Statistics, Test Bias, Item Response Theory
Ferrara, Steve; Qunbar, Saed – Journal of Educational Measurement, 2022
In this article, we argue that automated scoring engines should be transparent and construct relevant--that is, as much as is currently feasible. Many current automated scoring engines cannot achieve high degrees of scoring accuracy without allowing in some features that may not be easily explained and understood and may not be obviously and…
Descriptors: Artificial Intelligence, Scoring, Essays, Automation
Binici, Salih; Cuhadar, Ismail – Journal of Educational Measurement, 2022
Validity of performance standards is a key element for the defensibility of standard setting results, and validating performance standards requires collecting multiple pieces of evidence at every step during the standard setting process. This study employs a statistical procedure, latent class analysis, to set performance standards and compares…
Descriptors: Validity, Performance, Standards, Multivariate Analysis
Shermis, Mark D. – Journal of Educational Measurement, 2022
One of the challenges of discussing validity arguments for machine scoring of essays centers on the absence of a commonly held definition and theory of good writing. At best, the algorithms attempt to measure select attributes of writing and calibrate them against human ratings with the goal of accurate prediction of scores for new essays.…
Descriptors: Scoring, Essays, Validity, Writing Evaluation
Kaiwen Man; Joni M. Lakin – Journal of Educational Measurement, 2024
Eye-tracking procedures generate copious process data that could be valuable in establishing the response processes component of modern validity theory. However, there is a lack of tools for assessing and visualizing response processes using process data such as eye-tracking fixation sequences, especially those suitable for young children. This…
Descriptors: Problem Solving, Spatial Ability, Task Analysis, Network Analysis
A. Corinne Huggins-Manley; Brandon M. Booth; Sidney K. D'Mello – Journal of Educational Measurement, 2022
The field of educational measurement places validity and fairness as central concepts of assessment quality. Prior research has proposed embedding fairness arguments within argument-based validity processes, particularly when fairness is conceived as comparability in assessment properties across groups. However, we argue that a more flexible…
Descriptors: Educational Assessment, Persuasive Discourse, Validity, Artificial Intelligence
Dorsey, David W.; Michaels, Hillary R. – Journal of Educational Measurement, 2022
We have dramatically advanced our ability to create rich, complex, and effective assessments across a range of uses through technology advancement. Artificial Intelligence (AI) enabled assessments represent one such area of advancement--one that has captured our collective interest and imagination. Scientists and practitioners within the domains…
Descriptors: Validity, Ethics, Artificial Intelligence, Evaluation Methods
Lane, Suzanne – Journal of Educational Measurement, 2019
Rater-mediated assessments require the evaluation of the accuracy and consistency of the inferences made by the raters to ensure the validity of score interpretations and uses. Modeling rater response processes allows for a better understanding of how raters map their representations of the examinee performance to their representation of the…
Descriptors: Responses, Accuracy, Validity, Interrater Reliability
van Laar, Saskia; Braeken, Johan – Journal of Educational Measurement, 2022
The low-stakes character of international large-scale educational assessments implies that a participating student might at times provide unrelated answers as if s/he was not even reading the items and choosing a response option randomly throughout. Depending on the severity of this invalid response behavior, interpretations of the assessment…
Descriptors: Achievement Tests, Elementary Secondary Education, International Assessment, Foreign Countries
Langenfeld, Thomas; Thomas, Jay; Zhu, Rongchun; Morris, Carrie A. – Journal of Educational Measurement, 2020
An assessment of graphic literacy was developed by articulating and subsequently validating a skills-based cognitive model intended to substantiate the plausibility of score interpretations. Model validation involved use of multiple sources of evidence derived from large-scale field testing and cognitive labs studies. Data from large-scale field…
Descriptors: Evidence, Scores, Eye Movements, Psychometrics
Köhler, Carmen; Pohl, Steffi; Carstensen, Claus H. – Journal of Educational Measurement, 2017
Competence data from low-stakes educational large-scale assessment studies allow for evaluating relationships between competencies and other variables. The impact of item-level nonresponse has not been investigated with regard to statistics that determine the size of these relationships (e.g., correlations, regression coefficients). Classical…
Descriptors: Test Items, Cognitive Measurement, Testing Problems, Regression (Statistics)
Ju, Unhee; Falk, Carl F. – Journal of Educational Measurement, 2019
We examined the feasibility and results of a multilevel multidimensional nominal response model (ML-MNRM) for measuring both substantive constructs and extreme response style (ERS) across countries. The ML-MNRM considers within-country clustering while allowing overall item slopes to vary across items and examination of whether certain items were…
Descriptors: Cross Cultural Studies, Self Efficacy, Item Response Theory, Item Analysis
Kane, Michael T. – Journal of Educational Measurement, 2013
To validate an interpretation or use of test scores is to evaluate the plausibility of the claims based on the scores. An argument-based approach to validation suggests that the claims based on the test scores be outlined as an argument that specifies the inferences and supporting assumptions needed to get from test responses to score-based…
Descriptors: Test Interpretation, Validity, Scores, Test Use
Newton, Paul E. – Journal of Educational Measurement, 2013
Kane distinguishes between two kinds of argument: the interpretation/use argument and the validity argument. This commentary considers whether there really are two kinds of argument, two arguments, or just one. It concludes that there is just one argument: the validity argument. (Contains 2 figures and 5 notes.)
Descriptors: Validity, Test Interpretation, Test Use