k-means clustering and its usecase in the security domain

NARMADA M
2 min readJul 19, 2021

--

Clustering

Cluster analysis, or clustering, is an unsupervised machine learning task. It involves automatically discovering natural grouping in data. Unlike supervised learning (like predictive modeling), clustering algorithms only interpret the input data and find natural groups or clusters in feature space.

K-Means

K-Means Clustering may be the most widely known clustering algorithm and involves assigning examples to clusters in an effort to minimize the variance within each cluster.

K-means clustering is a type of unsupervised learning, which is used when you have unlabeled data (i.e., data without defined categories or groups). The goal of this algorithm is to find groups in the data, with the number of groups represented by the variable K. The algorithm works iteratively to assign each data point to one of K groups based on the features that are provided. Data points are clustered based on feature similarity. The results of the K-means clustering algorithm are:

  1. The centroids of the K clusters, which can be used to label new data
  2. Labels for the training data (each data point is assigned to a single cluster)

Rather than defining groups before looking at the data, clustering allows you to find and analyze the groups that have formed organically. The “Choosing K” section below describes how the number of groups can be determined.

Each centroid of a cluster is a collection of feature values which define the resulting groups. Examining the centroid feature weights can be used to qualitatively interpret what kind of group each cluster represents.

Considering the problem of network security under the background of big data, the clustering analysis algorithms can be utilized to improve the correctness of network intrusion detection models for security management. As a kind of iterative clustering analysis algorithm, K-means algorithm is not only simple but also efficient, so it is widely used. However, the traditional K-means algorithm cannot well solve the network security problem when facing big data due to its high complexity and limited processing ability. In this case, to optimize the traditional K-means algorithm based on the Spark platform and deploy the optimized clustering analysis algorithm in the distributed architecture, so as to improve the efficiency of clustering algorithm for network intrusion detection in big data environment.The traditional K-means algorithm, the efficiency of the optimized K-means algorithm using a Spark-based method is significantly improved in the running time.

— — — — — — — — — — — — — — — — — — — — — — — — — —

#worldrecordholder #training #internship #makingindiafutureready #summer #summertraining #python #machinelearning #docker #rightmentor #deepknowledge #linuxworld #vimaldaga #righteducation

--

--