Introduction to Self-Organizing Networks

Given the complexity of today’s telecommunication networks the incorporation of automatic features into network management has become a key element in new generation networks in order to keep high standard in quality of service, while reducing operational costs. The solution for this complex optimization problem has been found in Self-Organising Network (SON) framework. Self-Organizing Networks (SONs) is a recent standard with main objective to reduce operational expenditures (OPEX) and capital expenditures (CAPEX) by automating optimization tasks and network maintenance.

SON functionalities are classified into three groups:

  • Self-Configuration is a process of bringing a new network element into service with minimal human operator intervention. It can also be activated if there is a change in the system, such as failure of a node or a change is required in service type.
  • Self-Optimization is a framework which tries to optimize network parameters in order to achieve an optimal or near optimal performance. By monitoring the system continuously, and analyzing the measurements received from the network self-optimization procedure makes sure that the objectives are met and the overall performance of the network is near optimum and is above the pre-defined threshold .
  • Self-Healing functions carry out the detection, diagnosis, compensation and recovery of network failures in an automatic manner. One of the fundamental use cases in Self-Healing is outage management. outage management comprises outage detection and outage compensation. There is an outage when a network cannot carry traffic due to a failure. In this situation it is very important to identify the location of the outage as soon as possible to minimize the effects in the network. When the detected problem persists and cannot be solved immediately, the outage compensation algorithm is executed to try to serve affected users by reconnecting them to neighboring healthy units of network until the fault is solved.

Motivation for using Machine Learning in SONs

Since, SON is essential to reduce the cost associated with network operations by diminishing human intervention there is a necessity for the design of more intelligent techniques. Recent success of applications built by artificial intelligence (AI) techniques, such as machine translation, speech recognition, image classification, shows that machine learning, sub-field of AI, provides the right tools to develop truly intelligent scalable solutions in almost every industry that can generate sufficiently enough amount of data. A telecom network is a major source of data measuring and storing various types of network indicators in the data bases and thus machine learning algorithms are perfect to increase the degree of automation in different aspects of the telecom operations.

This paper focuses on describing a huge potential of cutting-edge machine learning techniques for self-healing networks.

Machine Learning in Self-Healing Networks

Overview of Machine Learning

Machine learning is responsible for designing algorithms that can learn from data and output analytic or non analytic (black box) functions /models that are able to make inferences or develop strategies based on the observations.

There are three major fields in machine learning:

  • Supervised learning encompasses methods that learn patterns in data and are used to build models which are able to classify objects or predict specific events. To achieve that labeled data is required.
  • Unsupervised learning algorithms are used to explore and identify hidden structures in data. These insights can be used to profile a given set of interest (e.g. set of clients, products, etc.) and to identify outliers in the data. In this scenario data does not have to be labeled.
  • Reinforcement learning develops frameworks that learn optimization strategies through observing interactions between agent and environment. The goal is to come up with the sequence of actions (when and what to do ) that maximizes expected reward.

Techniques from all these three areas of machine learning can be used to build more efficient self-healing networks [1-3]. Below, the tasks of self-healing network, outage detection, classification and recovery from failure are discussed separately in the context of machine learning.

Outage Detection and Classification

Detection of failure can be achieved in two different ways, in a supervised or unsupervised learning fashion. If data is labeled that is the observations associated with the outage are already identified by an analyst or operator one can use cutting-edge algorithms (XGBoost [4], CatBoost [5], Random Forest [6], Deep Learning [7], etc.) as well as traditional ones, such as logistic regression, in order to predict the outage in advance. Such insight is very valuable because it allows us to take preventive actions and consequently reduce unplanned costs and maintain the same level of service quality.

But there are scenarios when observations related to outage are not known in advance, they are not labeled. In such cases one needs to address the problem with anomaly detection algorithms: Isolation Forest [8], Local Outlier Factor [9, 10], One Class Support Vector Machine [10]. Distribution of attributes of Multidimensional data is analyzed and observations are classified as anomalous or normal based on an anomaly score.

Outage diagnosis implies outage classification. Assigning correct type to a given failure is important to activate appropriate procedures. The supervised learning algorithms mentioned above are able to accurately identify class of the breakdown.

Outage recovery and AutoML as Optimization Problems

To compensate for the collapse of a given unit of the network system needs to start reconfiguring its parameters in an intelligent manner until it finds the optimal configuration such that the network is able to provide the service to the customers without disruption.

Anomaly detection and predictive models mentioned in the previous section yield very accurate results but it is well known that performance of those models degrade over time. One of the major reasons for that is the change in the probability distribution of features that are used to build the models. In other words, characteristics of the data source and data generation process are changing over time. For self-healing network to be efficient it is crucial that failure detection models are always up to date and adapted to current environment. Therefore, one has to continuously monitor model performance and retrain it when necessessary to find optimal features, model and hyperparameters. AutoML is a technology that automates this process.

It is clear that both, outage recovery and automated model development are optimization problems in a sense that in both cases we are tuning multiple categorical and continuous parameters with infinitely many combinations and one has to search for and learn the strategy from all possible actions in a timely manner that maximizes the network performance.

Reinforcement learning [11] and in general, machine learning provides state-of-the-art search and learn algorithms yielding top results with reasonable amount of computational power. Hyperband [12] is multi armed bandit based algorithm. It is light and easy to implement. Usually, optimization techniques are built to find optimum values for analytic and differentiable objective functions and are based on the idea of gradient descent. But in real life problems most of the time the objective function that we actually want to optimize is very complex and/or is not analytic and thus is not differentiable. Evolution strategies and genetic algorithms [13, 14] represent a family of very powerful derivative-free optimization methods. Another novel technique inspired by genetic algorithm and transfer learning in Population Based Training (PBT) [15]. It is also a derivative-free approach jointly optimizing hyperparameters and objective function.

The drawback of these algorithms is that during training they discard not so well performing actions after a certain number of iterations to save time and computation power. Theory and practice shows that there are scenarios when a specific action is less performing than another one after few iterations but in the long run the first action turns out to be more optimal. So, dropping an action because it is the least optimal after pre defined number of iterations is not efficient. In order to overcome that problem one can associate a posterior probability distribution to each action which is updated according to the reward a given action generates. This allows us to keep track of the action’s performance and discard it when the associated cumulative reward is converged and is less than that of the competing actions.

If historical data of outages is available which contains characteristics of failure, relevant network parameters, description of recovery actions and whether those actions succeeded or not it is possible to design yet another approach to optimization problem. One can build a predictive model which for a given outage outputs the probabilities of success for all actions. But as we said earlier there are more options in action space than that in a database which is used to develop a model. In other words, there is a chance that more optimized solutions are found if we tried other actions than those observed in the data. We need to introduce exploration element in the algorithm and that is done by sampling the model parameters from some arbitrary probability distribution which is centered around the values that model learned during training and using them to compute probability of success for each action. In the end it should me mentioned that models designed is the supervised learning fashion need to be used in combination with anomaly detection system in case new types of breakdowns, not registered in a database, occur.


Building efficient self-healing network is a complex problem because it requires high degree of automation. But recent success in machine learning enables us to implement state-of-the-art technologies to build intelligent models and systems that are able to find optimal or near optimal solutions in a timely manner and can be updated automatically.


[1] Mohammed S.Hadi, Ahmed Q. Lawey, Taisir E.H.El-Gorashi, Jaafar M.H.Elmirghani. Big data analytics for wireless and wired network design: A survey. Computer Networks, Volume 132, 180-199 (2018)

[2] Jessica Moysen, Lorenza Giupponi. From 4G to 5G: Self-organized Network Management meets Machine Learning. arXiv:1707.09300 (2018)

[3] Valente Klaine, P., Imran, M. A., Onireti, O., Souza, R. D. A survey of machine learning techniques applied to self organizing cellular networks. IEEE Communications Surveys and Tutorials (2017)

[4] Tianqi Chen, Carlos Guestrin. XGBoost: A Scalable Tree Boosting System. arXiv:1603.02754 (2016)

[5] Liudmila Prokhorenkova, Gleb Gusev, Aleksandr Vorobev, Anna Veronika Dorogush, Andrey Gulin. CatBoost: unbiased boosting with categorical features. arXiv:1706.09516 (2017)

[6] Leo Breiman. Random Forests. Machine Learning (2001) 45: 5-32 (2001)

[7] Ian Goodfellow, Yoshua Bengio, Aaron Courville. Deep Learning. MIT Press (2016)

[8] Fei Tony Liu, Kai Ming Ting, Zhi-Hua Zhou. Isolation Forest. The Eighth IEEE International Conference on Data Mining (2008)

[9] Markus M. Breunig, Hans-Peter Kriegel, Raymond T. Ng, and J org Sander. LOF: identifying density-based local outliers. ACM SIGMOD Record, 29(2):93–104 (2000)

[10] Zoha, A., Saeed, A., Imran, A., Imran, M. A., and Abu-Dayya, A. A learning-based approach for autonomous outage detection and coverage optimization. Transactions on Emerging Telecommunications Technologies, 27(3), pp. 439-450 (2016)

[11] Rouzbeh Razavi, Siegfried Klein, Holger Claussen. A Fuzzy reinforcement learning approach for self-optimization of coverage in LTE networks. Bell Labs Technical Journal, Volume: 15(3), pp. 153-175 (2010)

[12] Lisha Li, Kevin Jamieson, Giulia DeSalvo, Afshin Rostamizadeh, Ameet Talwalkar. Hyperband: A Novel Bandit-Based Approach to Hyperparameter Optimization. arXiv:1603.06560 (2016)

[13] David E. Goldberg. Genetic Algorithms in Search, Optimization and Machine Learning. Addison-Wesley Publishing Company, Inc. (1989)

[14] Jessica Moysen, Lorenza Giupponi, Josep Mangues-Bafalluy. A Machine Learning enabled network Planning tool.27th Annual IEEE International Symposium on Personal, Indoor and Mobile Radio Communications (2016)

[15] Max Jaderberg, Valentin Dalibard, Simon Osindero, Wojciech M. Czarnecki, Jeff Donahue, Ali Razavi, Oriol Vinyals, Tim Green, Iain Dunning, Karen Simonyan, Chrisantha Fernando, Koray Kavukcuoglu. Population Based Training of Neural Networks. arXiv:1711.09846 (2017)

Leave a Reply