Context-aware Sampling of Large Networks via Graph Representation Learning

Zhiguang Zhou, Chen Shi, Xilong Shen, Lihong Cai, Haoxuan Wang, Yuhua Liu, Ying Zhao, Wei Chen

View presentation: 2020-10-30T14:15:00Z GMT-0600 Change your timezone on the schedule page
2020-10-30T14:15:00Z
Exemplar figure
A case for a Webbase data (16k nodes, 26k edges) based on our context-aware sampling method. (a) presents the original graph with a node-link diagram. (c) presents scatterplots obtained through GRL (node2vec) and dimensionality reduction (t-SNE). (e) highlights a local structure of interest in (a), and the circled nodes are of significance. (g) presents an aggregated layout of (a), in which each supernode represents a community feature. Our sampling method is conducted on (c), and the sampled scatterplots are presented in (d) with a contextual structure of interest highlighted by a red circle. (b) presents the corresponding sampled graph, with the significant features retained such as bridging nodes highlighted in (f) and graph connections presented in (h).
Fast forward

Direct link to video on YouTube: https://youtu.be/ZvQ5-LcZV7w

Keywords

Graph sampling, Graph representation learning, Blue noise sampling, Evaluation

Abstract

Numerous sampling strategies have been proposed to simplify large-scale networks for highly readable visualizations. It is of great challenge to preserve contextual structures formed by nodes and edges with tight relationships in a sampled graph, because they are easily overlooked during the process of sampling due to their irregular distribution and immunity to scale. In this paper, a new graph sampling method is proposed oriented to the preservation of contextual structures. We first utilize a graph representation learning (GRL) model to transform nodes into vectors so that the contextual structures in a network can be effectively extracted and organized. Then, we propose a multi-objective blue noise sampling model to select a subset of nodes in the vectorized space to preserve contextual structures with the retention of relative data densities and relative cluster densities in addition to those significant topology features, such as bridging nodes and graph connections. We also design a visual interface that supports conduct context-aware sampling, compare results with various sampling strategies, and deeply explore large networks. Case studies and quantitative comparisons of sampling results based on real-world datasets have demonstrated the effectiveness of our method in the abstraction and exploration of large networks.