Graph Algorithms

Centrality Measures

Centrality measures in graph theory are useful for identifying important nodes or edges in a network based on their structural position, connectivity, or influence. These measures can help to identify critical nodes or paths that are susceptible to disruption, failure, or delays, and thus require closer attention, monitoring, or improvement.

Graphistry's Centrality Measures and Their Relevance:

betweenness_centrality (cuGraph/igraph): This measures the extent to which a node or edge lies on the shortest paths between other nodes in the network. Nodes or edges with high betweenness centrality are likely to be critical bottlenecks or connectors that control the flow of information, goods, or services.
edge_betweenness_centrality (cuGraph/igraph): This is similar to betweenness centrality but focuses on edges rather than nodes. It measures the extent to which an edge lies on the shortest paths between other pairs of nodes in the network.
katz_centrality (cuGraph): This measures the influence of a node based on the number and proximity of its neighbors, as well as the importance of those neighbors themselves. It assigns higher scores to nodes that are connected to other highly connected or influential nodes.
hits (cuGraph): This is a recursive algorithm that assigns two scores to each node, one for its authority (i.e., how well it is endorsed by other authoritative nodes) and one for its hubness (i.e., how well it endorses other authoritative nodes). Nodes with high authority are considered to be highly valued or trusted, while nodes with high hubness are considered to be highly informative or knowledgeable.
pagerank (cuGraph/igraph): This is a measure of the importance of a node based on the number and quality of its incoming links. It assigns higher scores to nodes that are linked to by other highly ranked or influential nodes.
closeness (igraph): This measures the degree to which a node is close to other nodes in the network. It assigns higher scores to nodes that have shorter average distances to other nodes.
eigenvector_centrality (igraph): This measures the influence of a node based on the influence of its neighbors. It assigns higher scores to nodes that are connected to other highly influential nodes.
harmonic_centrality (igraph): This measures the average distance from a node to all other nodes in the network, taking into account the reciprocal of the shortest path lengths. It assigns higher scores to nodes that are closer to other nodes and are reachable by many different paths.
hub_score (igraph): This measures the degree to which a node is connected to other nodes that are highly connected. It assigns higher scores to nodes that are linked to other hubs or nodes with high degrees.

Examples:

Computing PageRank with Graphistry:

One of the simplest ways to compute PageRank on a graph in Graphistry is by using the compute_igraph('pagerank') method:


    import graphistry, pandas as pd

    # Create a DataFrame to represent the edges of a graph.
    edges = pd.DataFrame({'s': ['a','b','c','d'], 'd': ['c','c','e','e']})
    g = graphistry.edges(edges, 's', 'd')

    # Compute PageRank for the graph.
    g2 = g.compute_igraph('pagerank')

    assert 'pagerank' in g2._nodes.columns

Convert from igraph, including all node/edge properties:


    import graphistry, pandas as pd

    # Create a dataframe to represent edges in a graph
    edges = pd.DataFrame({
        's': ['a', 'b', 'c', 'd'],  # source nodes
        'd': ['b', 'c', 'd', 'e'],  # destination nodes
        'v': [101, 102, 103, 104]   # values for each edge
    })
    
    # Create a graph using the edges dataframe
    g = graphistry.edges(edges, 's', 'd').materialize_nodes().get_degrees()
    
    # Assert that the 'degree' column exists in the nodes of the graph
    assert 'degree' in g._nodes.columns
    
    # Convert the graph to and from igraph format
    g2 = g.from_igraph(g.to_igraph())
    
    # Assert that the node columns are preserved after conversion
    assert len(g2._nodes.columns) == len(g._nodes.columns)

Enrich from igraph, but only load in 1 node attribute:


    import graphistry, pandas as pd

    # Create a dataframe to represent edges in a graph
    edges = pd.DataFrame({
        's': ['a', 'b', 'c', 'd'],  # source nodes
        'd': ['b', 'c', 'd', 'e'],  # destination nodes
        'v': [101, 102, 103, 104]   # values for each edge
    })
    
    # Create a graph using the edges dataframe
    g = graphistry.edges(edges, 's', 'd').materialize_nodes().get_degree()
    
    # Assert that the 'degree' column exists in the nodes of the graph
    assert 'degree' in g._nodes
    
    # Convert the graph to igraph format without including nodes
    ig = g.to_igraph(include_nodes=False)
    
    # Assert that the 'degree' attribute is not in the igraph vertices
    assert 'degree' not in ig.vs
    
    # Compute and add pagerank to igraph vertices
    ig.vs['pagerank'] = ig.pagerank()
    
    # Convert back to graphistry format with specific node attributes
    g2 = g.from_igraph(ig, load_edges=False, node_attributes=[g._node, 'pagerank'])
    
    # Assert that 'pagerank' and 'degree' are in the nodes of the converted graph
    assert 'pagerank' in g2._nodes
    assert 'degree' in g2._nodes

Using igraph (CPU):


    import igraph as ig

    # Create an Erdős–Rényi graph with 1000 nodes and 5000 edges
    g_igraph = ig.Graph.Erdos_Renyi(n=1000, m=5000)
    
    # Compute eigenvector centrality for the graph
    centrality_igraph = g_igraph.event()

Graphistry Visualization with Centrality Measures:


import graphistry, pandas as pd

# Sample edges data
edges = pd.DataFrame({
    's': ['a', 'b', 'c', 'd'],
    'd': ['b', 'c', 'd', 'e'],
    'v': [101, 102, 103, 104]
})

# Convert the edges data to a Graphistry plot
g = graphistry.edges(edges, 's', 'd').materialize_nodes().get_degrees()

# Compute centrality measures using igraph directly on a Graphistry object
g2 = g.compute_igraph('betweenness')
assert 'betweenness' in g2._nodes.columns

g2 = g.compute_igraph('pagerank')
assert 'pagerank' in g2._nodes.columns

# Visualize the graph with centrality measures
g2.plot()

Coding Encoding with Centrality:

    
    import graphistry, pandas as pd

    # Sample edges data
    edges = pd.DataFrame({
        's': ['a', 'b', 'c', 'd'],
        'd': ['c', 'c', 'e', 'e']
    })

    # Convert the edges data to a Graphistry plot
    g = graphistry.edges(edges, 's', 'd')
    
    # Compute pagerank centrality scores for the nodes using igraph
    g2 = g.compute_igraph('pagerank')
    
    # Verify pagerank centrality scores are computed
    assert 'pagerank' in g2._nodes.columns
    
    # Encode the node color based on the pagerank centrality scores using a hot/cold gradient
    g2.encode_point_color('pagerank', ["blue", "red"], as_continuous=True)
    
    # Visualize the graph
    plot = g2.plot()

cuGraph Examples:


g2 = g.compute_cugraph('pagerank')
assert 'pagerank' in g2._nodes.columns

g2 = g.compute_cugraph('katz_centrality', out_col='katz_centrality_renamed')
assert 'katz_centrality_renamed' in g2._nodes.columns

g2 = g.compute_cugraph('k_truss', params={'k': 2})
assert 'k_truss' in g2._nodes.columns