Detección de anomalías en el tráfico agregado de redes IP basada en inferencia estadística sobre un modelo alfa-estable de primer orden
Anomaly detection in network traffic is an interesting field of research for data networks management. Such a detection system may provide useful information to a network manager, so that they can tell whether transmitted data contain unusual patterns. Anomaly detection usually considers a set of features, aimed to appropriately represent traffic data, which are then used to decide whether traffic should be classified as normal or anomalous.
One such way to extract traffic features is to apply a statistical traffic model to sampled data. Thus, the model parameters determine the behaviour of the data, and more robustness is attained, especially when few samples are available. Although there exist anomaly detection works based in statistical models, these contributions do not account for certain properties which are inherent to traffic data, and to which traffic modelling literature gives great importance. These properties, High Variability and Long-Range Dependence, may contain useful information for the classification of sampled data.
Once traffic features have been extracted, a classification is carried out, in order to decide whether data are anomalous or not. However, authors often do not elaborate on the problem of establishing reference patterns from which to make the final decision. In addition, network traffic usually exhibits a cycle-stationary behaviour, so a given reference pattern which is valid at a certain moment may be inadequate at any other time circumstances. Also, certain classifiers proposed in some research works are not closely linked to feature extraction strategies they use, so they cannot fully take advantage from the information extracted from traffic. It is also notable that some authors assume that immediate past traffic is normal, or leave the task of choosing reference patterns to an expert. Nevertheless, anomaly detection would benefit from reduced human intervention, while not making any assumptions on recent past traffic.
This thesis proposes an anomaly detection method based on a statistical model sensitive to observed traffic properties, so that extracted features are able to more accurately represent real data, and more information is provided to the classification subsystem. The proposed classifier is able to take advantage from the information provided by the aforementioned traffic properties. The problem of setting reference patterns without human intervention is also addressed. These patterns must be valid in any time circumstances, and independent from any assumption about immediate past traffic.
The proposal is validated by using real data, collected from two routers in the University of Valladolid. These routers provide two different levels of aggregation, so that conclusions extracted in validating the proposed method may be extrapolable to other networks wherever possible. Performance of the proposed method is also compared to other state-of-the-art results.