Literature review 2

Scalable Graph Exploration and Visualization: Sensemaking Challenges and Opportunities

Robert Pienta, James Abello, Minsuk Kahng, Duen Horng Chau

^-Georgia Institute of Technology

{pientars, kahng, polo}@gatech.edu

Rutgers University

abello@dimacs.rutgers.edu

1. INTRODUCTION

In many domains, data sets can be often represented as graphs or networks, as in online social networks, network traffic, intelligence analysis, and online auctions. In this survey, they have used the terms graph and network interchangeably. Today, making sense of large graphs remains a fundamental challenge, with few tools that allow users to interactively explore, visualize, and understand million-scale graphs.

Graph sensemaking

Graph sensemaking is a complex and abstract task. It refers to the iterative process of understanding and making sense out of data in graph format, where a user gradually builds up a representation of the information space to achieve the goals of users. It is unlikely that a single visualization will be sufficient for all sensemaking tasks. In their research they have covered both global and local paradigms which are needed for graph sensemaking. They have constructed a graph sensemaking hierarchy and placed relevant works in it accordingly.

Scaling up Graph Sensemaking: Interactive Sensemaking & Scalable Algorithms

With the challenge of exploring a large quantity of data, analysts need a combination of tools to support their abilities and interests. The overall style of the users’ investigation may vary in several ways. Scalability has been a top priority and success measure in prior and ongoing work.

2. GRAPH EXPLORATION AND VISUALIZATION

i. Visualization and exploration techniques

Graph sampling

Graph Sampling Both stochastic and deterministic approaches that have been proposed to solve the above mentioned problem in Introduction. The stochastic approaches use random sampling techniques to capture a smaller representative graph. Sampling graphs to maintain their properties is a challenging task especially with graph which are scale-free or power-law.

Graph Filtering

Graph Filtering Deterministic sampling and filtering approaches have also been proposed to reduce graph size. Techniques like edge bundling can be used to decrease the amount of space edges take. Many networks have such incredible scale that large graph visualizations are not sufficient to perform all exploration tasks.

Graph Partitioning

A common approach to creating an overview graph is to use partitioning methods on the graph and visualize the partitions. Large graph partitioning is a computationally expensive step and in cases of scale-free and near scale free networks the partitions may be exceedingly poor.

Graph Clustering

Graph Clustering proposes a novel distance measure that combines both structural distance as well as node attribute similarity. Allowing users to generate and customize their own clusters makes exploration more flexible to changes in input datasets. Rather than relying on a force-directed layout, Schneiderman and Aris propose a static graph layout called semantic substrates. Those regions allow users to control edge visibility to provide comprehensibility of each link’s source and destination.

ii. Global and Local Views

Because only subgraphs are displayed at a time, these approaches tend to improve visual comprehension and they require less computation, but may decrease global insight. A major strength of local views is that algorithms often only need to run on subgraphs, potentially improving scalability.

Free and targeted discovery

In many cases, the destinations of such trails can be used directly as search results, or even to teleport the user directly to the desired page. Such ideas and techniques may improve the quality of graph exploration tools, yielding more immediately interesting suggestions to the user. They observed a tradeoff wherein users would prefer simple solutions at the cost of efficiency.

iii. Subgraph mining

Pattern matching

At its heart, graph pattern matching is a variation of the subgraph isomorphism problem, an task of determining if a given graph is a sub graph of another graph. Exact graph pattern matching is computationally expensive and hard to parallelize. One sensible approach is to search for approximate matchings while another is to leverage special domain attributed subgraphs. There are few recent systems that offer approximate subgraph matching, which all focus on large scale techniques.

This is essential in scenarios where the user already knows of an interesting pattern exactly or approximately and wants to find where or how often it occurs in a larger graph. Many of these systems did not focus on the visualization of the query and results, but rather on the algorithmic and data mining challenges.

Frequent Subgraph Mining & Network

Although motif detection isn’t purely subgraph isomorphism, motif detection approaches currently can detect motifs with dozens of nodes, for modestly sized graphs. Those sub graphs with a statically significant appearance rate over the random graphs are considered as motifs. Other approaches have improved on the scalability of this work.
Symmetry-breaking is a technique by which their algorithm eliminates repeated subgraph isomorphism tests, leading to exponential speedups over the earlier techniques.

Overview & Challenges
How do we choose an appropriate view given a particular graph?

Graph visualizations with multiple levels of detail can yield a user improvement in understanding their data. This can be seen in, where both a graph summary view and a low level multivariate flow chart give users a combination of views. In this approach both the structural behavior of the network as well as the multivariate node data are visualized. Matrix Zoom is a matrix view strongly coupled with a hierarchical data view.
NodeTrix uses a connected matrix-view to capture both the sparsity and dense communities often found in social networks.

3.GRAPH INTERACTION

Graph interaction basics

Query-selection usually uses a query language or filtering interface to let the user specify which nodes they want selected based on node or edge level attributes. Query-selection can be especially useful in cases with rich multivariate node and edge data. Structural & Topological Navigation Topological navigation uses the graphs structure to show and hide portions of the graph based on the connections between nodes.

Often this is used so that only a local area of interest is displayed. This is often achieved by only drawing direct neighbors or the egonet of a node. This neighborhood traversal technique can be very effective.

Structural & Topological Navigation

TreePlus uses a tree structure to aid in users’ exploration of hierarchically clustered graph data. By letting users selectively grow the hierarchy, TreePlus strikes a balance between detail and originality by offering excellent readability, layout stability.

Degree of Interest Navigation

These methods use a degree of interest function to hide parts of the graph that are not interesting to the user. A DoI function evaluates the importance of nodes based on an initial node or group of nodes and produces a ranking for related nodes. Neighborhood traversal can be expressed as a simple DoI function. While the DoI functions proposed originally in used a form of graph distance, other graph-attributes can be used to capture user interest.

4. CONCLUSION

Graph summarization whether through sampling, filtering, partitioning, clustering or a combination often requires computationally expensive operations many of which cannot be completed fast enough for a real time system. Rather than relying on precomputation, new methods for graph visualization should investigate faster solutions like iteration or approximation to yield faster summary information. With so many challenging research opportunities , graph sense making will continue to inspire new research and vivid visualizations in the years to come.

Search This Blog

Final project | PENTACODE

Literature review 2

Comments

Post a Comment

Popular posts from this blog

KNN Algorithm

Use amCharts to visualize Google Analytics data

What are the steps used in Machine Learning?