Wednesday, April 16, 2025 09:30AM

Ph.D. Proposal

 

Xiao Jing

(Advisor: Prof. Dimitri Mavris)


DEVELOPMENT OF AVIATIONBENCH: A COMPREHENSIVE BENCHMARK FRAMEWORK FOR NLP MODELS IN AVIATION SAFETY

Wednesday, April 16 

9:30 a.m.
Weber Space and Technology Building (SST II)
Collaborative Visualization Environment (CoVE)

Teams Meeting: https://teams.microsoft.com

 

 Abstract
The field of aviation safety is increasingly reliant on advanced natural language processing (NLP) techniques to extract insights from the growing volume of incident and accident narratives. Despite significant advances in both data collection and modeling capabilities, there exists a critical gap in the standardized evaluation of NLP systems in this safety-critical domain. Current NLP research on aviation safety remains fragmented, with isolated studies targeting different datasets and tasks, making holistic progress difficult to measure. Additionally, critical aviation safety events are underrepresented in existing corpora, leading to class imbalance and poor model performance on rare but high-stakes scenarios.

In the current paradigm, aviation safety NLP applications are evaluated using inconsistent metrics and datasets, often focusing on common scenarios while neglecting rare but critical events. Technical challenges include the domain-specific terminology of aviation safety narratives, the complex causal relationships they describe, and the need for reliable performance on safety-critical edge cases. Moreover, annotations for complex tasks like causal reasoning are scarce due to high labeling costs and the need for domain expertise. This motivates the overall objective of the current work.

The objective of this thesis proposal is to develop a comprehensive benchmark framework that will address these limitations and standardize evaluation across multiple tasks and datasets. This framework will enable more consistent and meaningful comparison of different NLP approaches for aviation safety, with special attention to performance on rare but critical events. A thorough treatment of data imbalance and annotation scarcity is included in light of the challenges in obtaining representative datasets for all safety-critical scenarios.

The proposed methodology utilizes a multi-faceted approach that integrates a benchmark framework for evaluating model performance across four core tasks, a knowledge-guided LLM generation system for creating high-quality synthetic data to address class imbalance, and a unified multi-task annotation framework to efficiently generate comprehensive labels. All of these components are combined in a methodology that will facilitate standardized evaluation, data augmentation, and efficient annotation for aviation safety NLP.

Committee
Prof. Dimitri Mavris – School of Aerospace Engineering (Advisor)
Dr. Mayank Bendarkar – School of Aerospace Engineering
Prof. Duen Horng (Polo) Chau – School of Computational Science and Engineering
Prof. Xiuwei Zhang – School of Computational Science and Engineering
Prof. Kuen-Da (Dalton) Lin – School of International Affairs