Ohio State is in the process of revising websites and program materials to accurately reflect compliance with the law. While this work occurs, language referencing protected class status or other activities prohibited by Ohio Senate Bill 1 may still appear in some places. However, all programs and activities are being administered in compliance with federal and state law.

Seminar: Srinivasan Parthasarathy

Statistics Seminar
October 4, 2012
All Day
209 W. Eighteenth Ave. (EA), Room 170

Title

Scalable Network Clustering via Local Sparsification

Speaker

Srinivasan Parthasarathy, The Ohio State University

Abstract

Many real world problems (biological, social, web) can be effectively modeled as networks or graphs where nodes represent entities of interest and edges mimic the interactions or relationships among them. The study of such complex relationship networks, recently referred to as "network science", can provide insight into their structure, properties and emergent behavior. Of particular interest here are rigorous methods for uncovering and understanding important network structures (modules, communties) and motifs at multiple topological and temporal scales. Achieving this objective is challenging due to the presence of noise (false or missing interactions), topological (scale-free) properties and scalability. Given the importance of the graph clustering problem, a number of solutions ranging from hierarchical methods to spectral methods have been designed and developed.

In this talk we will look at how to sparsify a graph i.e. how to reduce the edgeset while keeping the nodes intact, so as to enable faster graph clustering without sacrificing quality. The main idea behind our approach is to preferentially retain the edges that are likely to be part of the same cluster. We propose to rank edges using a simple similarity-based heuristic that we efficiently compute by comparing the minhash signatures of the nodes incident to the edge. For each node, we select the top few edges to be retained in the sparsified graph. Extensive empirical results on several real networks and using six state-of-the-art graph clustering and community discovery algorithms reveal that our proposed approach realizes excellent speedups (often in the range 10-50), with little or no deterioration in the quality of the resulting clusters. In fact, for at least two of the four clustering algorithms, our sparsification consistently enables higher clustering accuracies due to the elimination of noisy edges. We also discuss some directions for future study.

Srinivasan Parthasarathy is a professor of computer science at Ohio State. He received his PhD at the University of Rochester, and a B. Tech. from the University of Roorkee (now IIT Roorkee). His research interests are in data mining, systems (broadly defined) and graph and network algorithms as they relate to social, biological and web applications. He is a recipient of both the NSF and DOE career awards, multiple research awards from Microsoft, Google and IBM, and multiple best paper awards and nominations from conferences such as IEEE ICDM, SIAM DM, VLDB, SIGKDD, ACM-Bioinformatics and ISMB.