Systematic investigation of machine learning techniques for network intrusion detection
Introduction
Network security has become a critical research area due to the current interest in and advancements in communications and internet technologies over the past ten years. It makes use of devices like firewalls, virus protection, and intrusion detection systems (IDS) to safeguard the security of a network and all of its connected assets within a cyberspace. Among these, the network-based intrusion detection system (NIDS) is the attack detection method that offers the needed protection by continuously scanning the network traffic for hostile and suspicious activity.
The researchers have looked into the use of deep learning (DL) and machine learning (ML) approaches to meet the needs of a successful IDS. The main goal of ML and DL, which fall under the broad heading of artificial intelligence (AI), is to extract meaningful information from huge data. The tremendous growth in network traffic and the related security risks have made it extremely difficult for NIDS systems to effectively detect malicious intrusions Ahmad et al., (2021).
The study of DL approaches for NIDS is still in its early stages, and there is still a lot of room to analyze this technique within NIDS to effectively detect network invaders. In order to give a comprehensive overview of current trends and developments in ML- and DL-based NIDS systems, this research paper will focus on recent developments in these areas.
Figure 1: Intrusion detection system classification taxonomy
ML algorithms for NIDS
Decision tree
One of the fundamental supervised machine learning (ML) techniques, DT applies a series of judgments to both classify and predict the dataset (rules). The structure of the model is that of a typical tree, with branches, nodes, and leaves. Each node stands for a characteristic or feature. CART, ID3, and C4.5 are the three most popular DT models. Numerous decision trees are used to create many sophisticated learning algorithms, including XGBoost and Random Forest (RF).
K-Nearest Neighbour
One of the easiest supervised machine learning (ML) algorithms, KNN, uses the concept of “feature similarity” to determine the class of a given data sample. It determines a sample’s identity based on its neighbours by figuring out how far away it is from them. The KNN algorithm’s parameter k has an impact on how well the model performs Binbusayyis and Vaiyapuri (2021).
Support vector machine
In n-dimensional feature space, the max-margin separation hyper-plane serves as the foundation for the supervised machine learning method known as SVM. The two linear and nonlinear issues can be solved using it.
Artificial neural network
The neurons (nodes) that make up an ANN are the processing units and the connections that link them. An input layer, numerous hidden layers, and an output layer are how these nodes are arranged. For the ANN’s learning process, the backpropagation method is employed. The ability to execute nonlinear modelling by training from larger datasets is the fundamental benefit of utilising an ANN approach.
Ensemble methods
The fundamental tenet of ensemble methods is that learning should be done collaboratively in order to benefit from the various classifiers. Considering that every classifier has its advantages and disadvantages. Some systems may be effective at spotting a particular kind of attack but perform poorly against other attack types. Using an ensemble approach, weak classifiers are combined into stronger ones by training many classifiers, which are then chosen using a voting technique Salih et al., (2021).
Research challenges
Unavailability of a systematic dataset
The current study brought to light the absence of a current dataset that reflects novel attacks for contemporary networks. The systematic creation of a current dataset with sufficient examples of practically all attack types is one of the research problems for IDS. The dataset should be regularly updated to reflect the most recent intrusion instances and made available to the public to aid the research community.
Lower detection accuracy due to imbalance dataset
The majority of the proposed IDS approaches show lower detection accuracies for some attack types than the model’s overall detection accuracies, according to the current study, which is another important finding.
Low performance in real-world environment
The effectiveness of IDS in a real-world setting is another study issue for them. Since the majority of the suggested approaches are examined and validated in a lab setting utilising openly available datasets Imrana et al., (2021).
Resources consumed by complex models
The majority of IDS strategies suggested by the researcher (approximately 80% of methods used were DL- or DL-ML-based methods) are based on extremely sophisticated models that demand a lot of processing time and computing resources. The processing unit may experience additional overhead as a result, which would ultimately have an impact on IDS performance.
Lightweight IDS for IoT
IoT networks and the sensor nodes they are connected to can both be secured using an IDS. Sensor nodes in an IoT system gather a vast amount of vital data that is disseminated online Alzahrani and Alenazi, (2021).
Future trends
Efficient NIDS framework
The attack characteristics in a dataset should be updated often by the IDS framework, and the model should continue to be trained with the upgraded definitions to enable the model to learn new features. In the long run, this will help the IDS model detect zero-day threats more accurately and reduce false alarms.
Solution to complex models
The detection accuracy will be almost as accurate when only the essential features are chosen as when the full collection of features is used. As a result, the model will gradually become less complex and will require less real-time computer power.
Use of DL algorithms
Researchers can also experiment with the hybrid approach of employing ML for classification and DL for feature extraction. As a result, the proposed model will be simpler.
Efficient NIDS for cyber-physical systems
It is necessary to have an effective and intelligent NIDS that can identify intrusions within networks that support UAVs. The use of AI in NIDS for UAV-enabled systems has the potential to be a fascinating study area, but it needs additional exploration and development.
Conclusions
To offer new researchers access to the most recent information, trends, and advancements in the area, this paper offers a thorough analysis of network intrusion detection systems based on ML and DL methodologies. The choice of pertinent publications in the area of AI-based NIDS is made using a methodical methodology. Future study in this area may focus on proposing an effective NIDS framework with less complicated DL algorithms and detection mechanisms. With the use of this knowledge, we will create a cutting-edge, portable, and effective machine learning- based NIDS in the future that will successfully identify network intruders.
References
Ahmad, Z., Shahid Khan, A., Wai Shiang, C., Abdullah, J., & Ahmad, F. (2021). Network intrusion detection system: A systematic study of machine learning and deep learning approaches. Transactions on Emerging Telecommunications Technologies, 32(1).
Alzahrani, A. O., & Alenazi, M. J. F. (2021). Designing a Network Intrusion Detection System Based on Machine Learning for Software Defined Networks. Future Internet, 13(5), 111.
Binbusayyis, A., & Vaiyapuri, T. (2021). Unsupervised deep learning approach for network intrusion detection combining convolutional autoencoder and one-class SVM. Applied Intelligence, 51(10), 7094–7108.
Imrana, Y., Xiang, Y., Ali, L., & Abdul-Rauf, Z. (2021). A bidirectional LSTM deep learning approach for intrusion detection. Expert Systems with Applications, 185, 115524.
Salih, A. A., Ameen, S. Y., Zeebaree, S. R. M., Sadeeq, M. A. M., Kak, S. F., Omar, N., Ibrahim, I. M., Yasin, H. M., Rashid, Z. N., & Ageed, Z. S. (2021). Deep Learning Approaches for Intrusion Detection. Asian Journal of Research in Computer Science, 50–64.