
Title
Scalable Network Clustering via Local Sparsification
Speaker
Srinivasan Parthasarathy, The Ohio State University
Abstract
Many real world problems (biological, social, web) can be effectively modeled as networks or graphs where nodes represent entities of interest and edges mimic the interactions or relationships among them. The study of such complex relationship networks, recently referred to as "network science", can provide insight into their structure, properties and emergent behavior. Of particular interest here are rigorous methods for uncovering and understanding important network structures (modules, communties) and motifs at multiple topological and temporal scales. Achieving this objective is challenging due to the presence of noise (false or missing interactions), topological (scale-free) properties and scalability. Given the importance of the graph clustering problem, a number of solutions ranging from hierarchical methods to spectral methods have been designed and developed.
In this talk we will look at how to sparsify a graph i.e. how to reduce the edgeset while keeping the nodes intact, so as to enable faster graph clustering without sacrificing quality. The main idea behind our approach is to preferentially retain the edges that are likely to be part of the same cluster. We propose to rank edges using a simple similarity-based heuristic that we efficiently compute by comparing the minhash signatures of the nodes incident to the edge. For each node, we select the top few edges to be retained in the sparsified graph. Extensive empirical results on several real networks and using six state-of-the-art graph clustering and community discovery algorithms reveal that our proposed approach realizes excellent speedups (often in the range 10-50), with little or no deterioration in the quality of the resulting clusters. In fact, for at least two of the four clustering algorithms, our sparsification consistently enables higher clustering accuracies due to the elimination of noisy edges. We also discuss some directions for future study.
Srinivasan Parthasarathy is a professor of computer science at Ohio State. He received his PhD at the University of Rochester, and a B. Tech. from the University of Roorkee (now IIT Roorkee). His research interests are in data mining, systems (broadly defined) and graph and network algorithms as they relate to social, biological and web applications. He is a recipient of both the NSF and DOE career awards, multiple research awards from Microsoft, Google and IBM, and multiple best paper awards and nominations from conferences such as IEEE ICDM, SIAM DM, VLDB, SIGKDD, ACM-Bioinformatics and ISMB.