Summary
Mosaic applied our robust machine learning expertise across disciplines to develop novel, data-driven models for anomaly detection using reinforcement learning for a leading aerospace government agency.
Take Our Content to Go
Introduction
In the following case study, Mosaic Data Science applied our robust data analytics expertise across disciplines to develop novel, data-driven models to identify anomalous behavior for a leading aerospace government agency. Modern data mining techniques provide powerful tools for identifying expected and unexpected relationships within datasets. Particularly important is identifying and understanding sequences of events that lead to degraded or undesirable states. This sort of dynamic predictive modeling adds a layer of complexity to analytics but is a key area of research at the cutting edge of data science.
The challenge of detecting patterns in data that do not match predicted behavior is known as anomaly detection, which is commonly applied in a wide range of use cases. Identifying anomalies in many application areas arises from the possibility of unprotected data, which may include valuable, relevant, and essential data.
Typically, anomalous data can be connected to some problem or rare event, such as bank fraud, medical problems, structural defects, malfunctioning equipment, etc. Anomaly detection can be used for cyber security and network intrusion, for detecting unusual activity in videos like road crimes or robberies, fault detection, hyperspectral imaging, and more. This connection makes it very interesting to be able to pick out which data points can be considered anomalies, and identifying these events is typically very powerful from a business perspective, making it possible to reduce costs, optimize efficiency, anticipate and mitigate risk, and keep downtime to a minimum.
Mosaic’s fundamental approach to address this topic consisted of blending different data mining and analytics methodologies to develop an anomaly detection solution for a customer in the aviation space. Mosaic identified anomalous states, whether they were of an expected structure or otherwise, with domain expert feedback to ensure that predictions about degraded conditions were operationally relevant.
Problem Statement
The continuing growth of the aviation system, both in operations and technology, necessitates the development of new monitoring technologies that leverage state-of-the-art data mining and analytics. Mosaic helped a major aerospace government organization develop data mining algorithms to identify anomalous conditions of operational significance in the National Airspace System (NAS). Acting as a true partner, Mosaic worked with internal stakeholders and subject matter experts to address their feedback and ensure the models were effective.
The government wanted to leverage this technology to improve the safety and efficiency of the NAS by analyzing operations, systems, data sources, and resources for anomalous activity. Data-driven models were at the core of this initiative, and Mosaic had significant expertise in this area of the aviation industry, e.g., relating temporal and spatial data at various scales, fusing data sources to relate like elements, and determining comprehensive flight keys. The team’s proven experience with this type of data, the methodology needed to clean and prepare it, and the problems that could arise provided a strong advantage in ensuring the successful completion of the project.
Mosaic applied our deep experience working with various machine learning approaches and significant domain knowledge to recommend and develop cutting-edge models for identifying anomalies in aviation data.
Reinforcement ML Techniques
In machine learning, the distinction between supervised and unsupervised lies in whether the input data are labeled, i.e., whether the models themselves are told what a correct answer should resemble. However, the two have significant overlaps, including various semi-supervised approaches. The team leveraged expertise in both areas to develop a strong suite of tools to help customers identify anomalies in the National Airspace System. Each of these model-development efforts was designed to address and incorporate human feedback.
Both approaches were key to the successful development of the anomaly detection solution. The supervised approach was essential for identifying and making early predictions about known types of issues. The unsupervised approach was important to help identify emergent behavior, particularly as new systems, capabilities, and vehicles are always being introduced to the NAS.
Development of Supervised Machine Learning Algorithms
The supervised learning anomaly detection effort was split into several phases, outlined in the chart below (Figure 1).
The focus was on transforming the data and feeding that into models for offline learning, including multivariate spatio-temporal models, and those covering Air Traffic Management (ATM) program characteristics and weather impacts. These models were trained using labeling from historical NAS anomaly reports, customer input, and Mosaic NAS experts’ backgrounds. The work was then applied to generate a runtime anomaly generator.
Throughout the duration of the project, several enhancements were made to the supervised anomaly detection algorithms initially developed (see Figure 5).
The algorithms were enhanced in three ways:
- Upgrades to the offline learning mode
- Implementation of overnight update capability
- Support for NAS safety and efficiency metrics.
Text mining capabilities related to ATC communications and other data were added for offline learning. The overnight update enabled incremental learning of anomalies from the day and supported anomaly predictions based on the daily weather forecast.
Development of Unsupervised Learning Algorithms
The unsupervised models were initially developed in several phases. Each level of the clustering model was developed iteratively using the cleaned and prepared data. Then, feedback was incorporated, using criteria such as cluster and proximity score, to identify and categorize anomalies.
Several enhancements were made to the unsupervised algorithms throughout the project. Additional sophistication was added to the models, new data sources were explored, and enhancements included to support the overnight update capability. A particular focus was placed on runtimes to ensure compatibility with the requirements of the overnight capability. Additional SME feedback was included to support improved model focus on the operational impact of anomalies.
Integration of Human-Reinforced Learning
Integrating human feedback into the learning process was key to Mosaic’s algorithm development. We developed an approach to properly format the model output describing a detected anomaly and some relevant data elements to accomplish this.
A subset of these identified “case studies” were presented to the customer’s subject matter experts from the team to elicit feedback using discrete categories that could be ingested back into the models. This approach was employed early in the development cycle when the models and available resources could support it. Mosaic anticipated that this would improve the ability of the learning models to differentiate between the anomalies that were operationally significant and those that were not, implementing the ability for the models to ingest this input.
The ability of the models to ingest human feedback to improve learning expanded as the team received more feedback, which was then fed into the models to improve predictive accuracy in identifying anomalies. This portion of the project had two main areas of focus:
- Generation of reports that SMEs could review to provide feedback.
- Improvement of the ability of the models to use this feedback, particularly during the overnight update cycle.
Deployment and Results Monitoring
Leveraging the integration between the machine learning algorithms and the two customer platforms, the supervised learning algorithms demonstrated higher performance than unsupervised learning algorithms in learning, online updating, and incorporating real-time feedback as to whether the observed system was experiencing an operationally significant incident.
If the incident was a precursor, the algorithm advised as to the probability and time horizon of reaching an operationally significant degraded state. Throughout the algorithm evaluation process, Mosaic worked with the customer to demonstrate new capabilities and improve performance.
Conclusion
Mosaic Data Science has supported the aviation community through research and development in many areas, including simulation tools, airport surface, traffic flow management, airspace capacity, human factors, data analysis, safety, and security. Mosaic’s significant emphasis on rigorous analysis and data warehousing, coupled with our advanced research on data mining concepts, made us an ideal partner for this project.
Reinforcement machine learning is a powerful technique that has proven to be highly effective in anomaly detection. To improve the aerospace customer’s NAS operations, Mosaic designed and deployed an Anomaly Detection System (ADS) to find patterns in their aviation dataset that do not conform to expected normal behavior. When available, the dataset containing labeled instances of normal behavior (also of abnormal behavior) was used for training a supervised or semi-supervised machine learning model. An unsupervised learning model was also tested for anomaly detection, but it performed very poorly compared to supervised or semi-supervised learning.
Reinforcement learning brings the full power of artificial intelligence to anomaly detection. In this project, human feedback played a large role in building the ADS, as it was incorporated into the model to determine positive rewards for correct predictions of anomaly and negative rewards for wrong predictions. Over a period of time, the model learned to predict anomalies with a high level of accuracy.