ERIC - Search Results

Publication Date

In 2025	0
Since 2024	3
Since 2021 (last 5 years)	9
Since 2016 (last 10 years)	14
Since 2006 (last 20 years)	30

Descriptor

Validity	52
Models	12
Scores	11
Test Interpretation	11
Item Response Theory	8
Test Use	8
Comparative Analysis	6
Computer Assisted Testing	6
Scoring	6
Statistical Analysis	6
Test Results	6
Evaluation Methods	5
Evidence	5
Psychometrics	5
Responses	5
Simulation	5
Theories	5
Accuracy	4
Data Analysis	4
Factor Analysis	4
Measurement Techniques	4
Reliability	4
Artificial Intelligence	3
Bias	3
Cognitive Tests	3
More ▼

Source

Journal of Educational…

Publication Type

Journal Articles	42
Reports - Research	20
Reports - Evaluative	9
Opinion Papers	7
Reports - Descriptive	5
Information Analyses	1
Speeches/Meeting Papers	1

Education Level

Middle Schools	3
Junior High Schools	2
Secondary Education	2
Elementary Education	1
Elementary Secondary Education	1

Audience

Researchers

Location

Laws, Policies, & Programs

Assessments and Surveys

Lexile Scale of Reading	1
Teaching and Learning…	1
Trends in International…	1
United States Medical…	1

What Works Clearinghouse Rating

Showing 1 to 15 of 52 results Save | Export

Argument-Based Approach to Validity: Developing a Living Document and Incorporating Preregistration

Peer reviewed

Direct link

Daria Gerasimova – Journal of Educational Measurement, 2024

I propose two practical advances to the argument-based approach to validity: developing a living document and incorporating preregistration. First, I present a potential structure for the living document that includes an up-to-date summary of the validity argument. As the validation process may span across multiple studies, the living document…

Descriptors: Validity, Documentation, Methods, Research Reports

A Bayesian Moderated Nonlinear Factor Analysis Approach for DIF Detection under Violation of the Equal Variance Assumption

Peer reviewed

Direct link

Sooyong Lee; Suhwa Han; Seung W. Choi – Journal of Educational Measurement, 2024

Research has shown that multiple-indicator multiple-cause (MIMIC) models can result in inflated Type I error rates in detecting differential item functioning (DIF) when the assumption of equal latent variance is violated. This study explains how the violation of the equal variance assumption adversely impacts the detection of nonuniform DIF and…

Descriptors: Factor Analysis, Bayesian Statistics, Test Bias, Item Response Theory

Validity Arguments for AI-Based Automated Scores: Essay Scoring as an Illustration

Peer reviewed

Direct link

Ferrara, Steve; Qunbar, Saed – Journal of Educational Measurement, 2022

In this article, we argue that automated scoring engines should be transparent and construct relevant--that is, as much as is currently feasible. Many current automated scoring engines cannot achieve high degrees of scoring accuracy without allowing in some features that may not be easily explained and understood and may not be obviously and…

Descriptors: Artificial Intelligence, Scoring, Essays, Automation

Validating Performance Standards via Latent Class Analysis

Peer reviewed

Direct link

Binici, Salih; Cuhadar, Ismail – Journal of Educational Measurement, 2022

Validity of performance standards is a key element for the defensibility of standard setting results, and validating performance standards requires collecting multiple pieces of evidence at every step during the standard setting process. This study employs a statistical procedure, latent class analysis, to set performance standards and compares…

Descriptors: Validity, Performance, Standards, Multivariate Analysis

Anchoring Validity Evidence for Automated Essay Scoring

Peer reviewed

Direct link

Shermis, Mark D. – Journal of Educational Measurement, 2022

One of the challenges of discussing validity arguments for machine scoring of essays centers on the absence of a commonly held definition and theory of good writing. At best, the algorithms attempt to measure select attributes of writing and calibrate them against human ratings with the goal of accurate prediction of scores for new essays.…

Descriptors: Scoring, Essays, Validity, Writing Evaluation

An Exploratory Study Using Innovative Graphical Network Analysis to Model Eye Movements in Spatial Reasoning Problem Solving

Peer reviewed

Direct link

Kaiwen Man; Joni M. Lakin – Journal of Educational Measurement, 2024

Eye-tracking procedures generate copious process data that could be valuable in establishing the response processes component of modern validity theory. However, there is a lack of tools for assessing and visualizing response processes using process data such as eye-tracking fixation sequences, especially those suitable for young children. This…

Descriptors: Problem Solving, Spatial Ability, Task Analysis, Network Analysis

Toward Argument-Based Fairness with an Application to AI-Enhanced Educational Assessments

Peer reviewed

Direct link

A. Corinne Huggins-Manley; Brandon M. Booth; Sidney K. D'Mello – Journal of Educational Measurement, 2022

The field of educational measurement places validity and fairness as central concepts of assessment quality. Prior research has proposed embedding fairness arguments within argument-based validity processes, particularly when fairness is conceived as comparability in assessment properties across groups. However, we argue that a more flexible…

Descriptors: Educational Assessment, Persuasive Discourse, Validity, Artificial Intelligence

Validity Arguments Meet Artificial Intelligence in Innovative Educational Assessment

Peer reviewed

Direct link

Dorsey, David W.; Michaels, Hillary R. – Journal of Educational Measurement, 2022

We have dramatically advanced our ability to create rich, complex, and effective assessments across a range of uses through technology advancement. Artificial Intelligence (AI) enabled assessments represent one such area of advancement--one that has captured our collective interest and imagination. Scientists and practitioners within the domains…

Descriptors: Validity, Ethics, Artificial Intelligence, Evaluation Methods

Modeling Rater Response Processes in Evaluating Score Meaning

Peer reviewed

Direct link

Lane, Suzanne – Journal of Educational Measurement, 2019

Rater-mediated assessments require the evaluation of the accuracy and consistency of the inferences made by the raters to ensure the validity of score interpretations and uses. Modeling rater response processes allows for a better understanding of how raters map their representations of the examinee performance to their representation of the…

Descriptors: Responses, Accuracy, Validity, Interrater Reliability

Random Responders in the TIMSS 2015 Student Questionnaire: A Threat to Validity?

Peer reviewed

Direct link

van Laar, Saskia; Braeken, Johan – Journal of Educational Measurement, 2022

The low-stakes character of international large-scale educational assessments implies that a participating student might at times provide unrelated answers as if s/he was not even reading the items and choosing a response option randomly throughout. Depending on the severity of this invalid response behavior, interpretations of the assessment…

Descriptors: Achievement Tests, Elementary Secondary Education, International Assessment, Foreign Countries

Integrating Multiple Sources of Validity Evidence for an Assessment-Based Cognitive Model

Peer reviewed

Direct link

Langenfeld, Thomas; Thomas, Jay; Zhu, Rongchun; Morris, Carrie A. – Journal of Educational Measurement, 2020

An assessment of graphic literacy was developed by articulating and subsequently validating a skills-based cognitive model intended to substantiate the plausibility of score interpretations. Model validation involved use of multiple sources of evidence derived from large-scale field testing and cognitive labs studies. Data from large-scale field…

Descriptors: Evidence, Scores, Eye Movements, Psychometrics

Dealing with Item Nonresponse in Large-Scale Cognitive Assessments: The Impact of Missing Data Methods on Estimated Explanatory Relationships

Peer reviewed

Direct link

Köhler, Carmen; Pohl, Steffi; Carstensen, Claus H. – Journal of Educational Measurement, 2017

Competence data from low-stakes educational large-scale assessment studies allow for evaluating relationships between competencies and other variables. The impact of item-level nonresponse has not been investigated with regard to statistics that determine the size of these relationships (e.g., correlations, regression coefficients). Classical…

Descriptors: Test Items, Cognitive Measurement, Testing Problems, Regression (Statistics)

Modeling Response Styles in Cross-Country Self-Reports: An Application of a Multilevel Multidimensional Nominal Response Model

Peer reviewed

Direct link

Ju, Unhee; Falk, Carl F. – Journal of Educational Measurement, 2019

We examined the feasibility and results of a multilevel multidimensional nominal response model (ML-MNRM) for measuring both substantive constructs and extreme response style (ERS) across countries. The ML-MNRM considers within-country clustering while allowing overall item slopes to vary across items and examination of whether certain items were…

Descriptors: Cross Cultural Studies, Self Efficacy, Item Response Theory, Item Analysis

Validating the Interpretations and Uses of Test Scores

Peer reviewed

Direct link

Kane, Michael T. – Journal of Educational Measurement, 2013

To validate an interpretation or use of test scores is to evaluate the plausibility of the claims based on the scores. An argument-based approach to validation suggests that the claims based on the test scores be outlined as an argument that specifies the inferences and supporting assumptions needed to get from test responses to score-based…

Descriptors: Test Interpretation, Validity, Scores, Test Use

Two Kinds of Argument?

Peer reviewed

Direct link

Newton, Paul E. – Journal of Educational Measurement, 2013

Kane distinguishes between two kinds of argument: the interpretation/use argument and the validity argument. This commentary considers whether there really are two kinds of argument, two arguments, or just one. It concludes that there is just one argument: the validity argument. (Contains 2 figures and 5 notes.)

Descriptors: Validity, Test Interpretation, Test Use

Previous Page | Next Page »

Pages: 1 | 2 | 3 | 4

Kane, Michael T.	2
de la Torre, Jimmy	2
A. Corinne Huggins-Manley	1
Airasian, Peter W.	1
Algina, James	1
Bart, William M.	1
Bejar, Isaac I.	1
Bennett, Randy Elliot	1
Binici, Salih	1
Borsboom, Denny	1
Braeken, Johan	1
Brandon M. Booth	1
Brennan, Robert L.	1
Briggs, Derek C.	1
Brookhart, Susan M.	1
Cantor, Nancy K.	1
Carstensen, Claus H.	1
Chang, Hua-Hua	1
Chen, Jinsong	1
Choi, In-Hee	1
Cronbach, Lee J.	1
Cuhadar, Ismail	1
Cui, Ying	1
D'Agostino, Ralph B.	1
More ▼