An Adaptive Hash-Based Text Deduplication for ADS-B Data-Dependent Trajectory Clustering Problem

Published in The 2019 IEEE-RIVF International Conference on Computing and Communication Technology, 2019

Recommended citation: Tan Tran, Quan Duong, Duc-Thinh Pham, An Mai (2019). "An Adaptive Hash-Based Text Deduplication for ADS-B Data-Dependent Trajectory Clustering Problem." The 2019 IEEE-RIVF International Conference on Computing and Communication Technology. https://ieeexplore.ieee.org/document/8713722

The Automatic Dependent Surveillance-Broadcast (ADS-B) protocol is equipped in aircraft as an alternative to secondary radar. This emerging technology produces such a prospective type of data to effectively broadcast the aircraft’s status (location, velocity, etc.,) in a specific area, which is very useful in air traffic management (ATM). However, there is still a limited number of advanced studies from machine learning/data mining perspectives relying on this kind of data in ATM research. On the other hand, Locality Sensitive Hashing (LSH) is a data mining technique often used to find similar items in the data with high-dimension properties. It is thus relatively suitable for handling with trajectories data to group similar flight paths. From these factors, we reveal in this paper an adaptive LSH-based algorithm, used in near-duplicated documents detection, for the problem of clustering the nearest trajectories by representing the trajectories as a bag-of-words used popularly in text mining. To illustrate our proposed method, an experiment is designed and carried out in thirty successive days, employing the raw ADS-B data collected from FlightAware for the case of Changi International Airport, Singapore. The evaluation based on Silhouette score shows promising results of measuring the clustering performance.

[ieee] [pdf] [bibtex]