Supergraph

class rdigraphs.supergraph.supergraph.SuperGraph(snode=None, path=None, path2snodes=None, path2sedges=None, label='sg', keep_active=False)

Bases: object

Generic class defining a supergraph.

A supergraph is a set of supernodes and super-edges connecting them.

A supernode is itself a graph.

A superedge connecting supernodes M and N is a directed bipartite graph whose nodes are the nodes of M and N, and edges are directed links from nodes in M to nodes in N.

The class provides methods to create a supergraph and add nodes and edges using data taken from an SQL database.

__init__(snode=None, path=None, path2snodes=None, path2sedges=None, label='sg', keep_active=False)

Initializes an empty supergraph with an empty dictionary of supernodes and an empty dictionary of superlinks

Parameters:
  • snode (str or None, optional (default=None)) – If None, an empty supergraph is created. Otherwise, the given snode is added to the supergraph structure

  • path (str or None, optional (default=None)) – Path to the supergraph folder

  • path2snodes (str or None, optional (default=None)) – Path to the snodes folder. If None, snodes will be located in folder ‘snodes’ from the input path

  • path2sedges (str or None, optional (default=None)) – Path to the sedges folder. If None, sedges will be located in folder ‘sedges’ from the input path

  • label (str, optional (default=’sg’)) – Name of the supegraph

  • keep_active (bool, optional (default=False)) – If True, all supergraph components loaded from memory or generated by some method are preserved in memory. If False, some methods may deactivate snodes or sedges to free some memory. This is useful for large graphs

__weakref__

list of weak references to the object (if defined)

activate_all()

Activates all snodes and sedges in the supergraph

activate_sedge(label)

Loads sedge into the dictionary of active sedges.

Parameters:

label (str) – Name of the sedge

activate_snode(label)

Loads snode into the dictionary of active snodes

Parameters:

label (str) – Name of the snode

addSuperEdge(sedge, weight=1, attributes={})

Add a superedge (sedge) to the supergraph structure. The sedge is an object (graph) from class SEdge, which is an extension of class DataGraph.

Parameters:
  • sedge (object) – Superedge (an object of class SEdge)

  • weight (float, optional (default=1)) – Weight of the sedge (in the supergraph)

  • attributes (dict, optional (default={})) – A dictionary of attributes

addSuperNode(snode, attributes={})

Add a snode to the supergraph structure. The node is an object (graph) from class DataGraph.

Parameters:
  • snode (object) – snode (an instance of class DataGraph)

  • attributes (dict, optional (default={})) – Dictionary of snode attributes

add_snode_attributes(label, att, att_values)
Parameters:
  • label (str) – Graph to add the new attribute

  • att (str or list) – Name of the attribute If att_values is a pandas dataframe, ‘att’ contains the name of the column in att_values that will be used as key for merging

  • values (list or pandas dataframe or dict) – If list: it contains the attribute values. If names is list, it is a list of lists. The order of the values must correspond with the order of nodes in the self node. If pandas dataframe, dataframe containing the new attribute values. This dataframe will be merged with the dataframe of nodes. In such case, name If dict, the keys must refer to values in the reference column of the snode.

  • att_values (dataframe containing the attribute values)

clean_up_metagraph()

Removes from the metagraph all nodes or edges that do no longer exist in the supergraph database.

computeSimBiGraph(s_label, t_label, e_label=None, s_min=None, n_edges=None, n_gnodesS=None, n_gnodesT=None, similarity='He2', g=1, blocksize=25000, useGPU=False, tmp_folder=None, save_every=1e+300, verbose=True)

Calls the snode method to compute a similarity bipartite graph.

It assumes that the source and target snodes with their corresponding feature matrices already exists in the supergraphs structure.

A new sedge connecting the source and target snodes will be created. If it already exists

Parameters:
  • s_label (str) – Name of the source snode

  • t_label (str) – Name of the target snode

  • e_label (str or None, optional (default=None)) – Name of the new s_edge

  • s_min (float or None, optional (default=None)) – Similarity threshold. Edges link all data pairs with similarity higher than R. This forzes a sparse graph.

  • n_edges (int or None, optional (default=None)) – Target number of edges. n_edges is an alternative to radius. Only one of both must be specified

  • n_gnodesS (int or None, optional (default=None)) – Number of nodes in the source subgraph. If None, all nodes are used If n_gnodesS < no. of rows in self.Xs, a random subsample is taken.

  • n_gnodesT (int or None, optional (default=None)) – Number of nodes in the target subgraph. If None, all nodes are used. If n_gnodesT < no. of rows in self.Xt, a random subsample is taken.

  • similarity (str {‘He2’, ‘He2->JS’}, optional (default=’JS’)) – Similarity measure used to compute affinity matrix. Available options are: (1) ‘He2’ (Same as He, but based on a proper implementation); (2) ‘He2->JS’: 1 minus Jensen-Shannon (JS) divergence

  • g (int or float, optional (default=1)) – Exponent for the affinity mapping

  • blocksize (int, optional (default=25_000)) – Size of each block for affinity computations

  • useGPU (bool, optional (default=False)) – If True, matrix operations are accelerated using GPU

  • tmp_folder (str or None, optional (defautl = None)) – Name of the folder to save temporary files

  • save_every (int, optional (default=0)) – Maximum size of the growing lists. The output lists are constructed incrementally. To avooy memory overload, growing lists are saved every time they reach this size limit. The full liests are thus incrementally saved in files. The default value is extremely large, which de facto implies no temporary saving.

  • verbose (bool, optional (default=True)) – If False, block-by-block messaging is omitted

computeSimGraph(label, s_min=None, n_edges=None, n_gnodes=None, similarity='He2', g=1, blocksize=25000, useGPU=False, tmp_folder=None, save_every=1e+300, verbose=True)

Calls the snode method to compute a similarity graph

Parameters:
  • label (str) – Name of the snode where the similarity graph will be computed.

  • s_min (float or None, optional (default=None)) – Similarity threshold. Edges link all data pairs with similarity higher than R. This forzes a sparse graph.

  • n_edges (int or None, optional (default=None)) – Target number of edges. n_edges is an alternative to radius. Only one of both must be specified

  • n_gnodes (int or None, optional (default=None)) – Number of nodes in the source subgraph. If None, all nodes are used If n_gnodes < no. of rows in self.T, a random subsample is taken.

  • similarity (str, optional (default=’JS’)) – Similarity measure used to compute affinity matrix. Available options are: (1) ‘JS’ (1 minus Jensen-Shannon (JS) divergence (too slow)); (2) ‘l1’ (1 minus l1 distance); (3) ‘He’ (1 minus squared Hellinger’s distance (sklean-based)); (4) ‘He2’ (Same as He, but based on a proper implementation); (5) ‘Gauss’ (An exponential function of the squared l2 distance); (6) ‘l1->JS’ (Same as JS, but the graph is computed after preselecting edges using l1 distances and a theoretical bound); (7) ‘He->JS’ (Same as JS, but the graph is computed after pre- selecting edges using Hellinger’s distances and a theoretical bound (8) ‘He2->JS’:Same as He-Js, but using a self implementation of He (9) ‘cosine’, cosine similarity

  • g (int or float, optional (default=1)) – Exponent for the affinity mapping

  • blocksize (int, optional (default=25_000)) – Size of each block for affinity computations

  • useGPU (bool, optional (default=False)) – If True, matrix operations are accelerated using GPU

  • tmp_folder (str or None, optional (defautl = None)) – Name of the folder to save temporary files

  • save_every (int, optional (default=0)) – Maximum size of the growing lists. The output lists are constructed incrementally. To avooy memory overload, growing lists are saved every time they reach this size limit. The full liests are thus incrementally saved in files. The default value is extremely large, which de facto implies no temporary saving.

  • verbose (bool, optional (default=True)) – (Only for he_neighbors_graph()). If False, block-by-block messaging is omitted

compute_ppr(s_label, t_label=None, th=0.9, inplace=False)

Calls the snode method to compute a similarity graph

Parameters:
  • label (str) – Name of the snode where the similarity graph will be computed.

  • th (float, optional (default=0.9)) – Threshold over the ppr to create a link

  • inplace (bool, optional (default=True)) – If true, the new graph overrides the original graph

cosine_sim(xlabel, ylabel)

Computes the cosine similarity between two supernodes, X and Y. The cosine similarity isdefined as follows:

sim = trace(X’ Y) / ||X|| ||Y||

where ||·|| is the Frobenius norm

Parameters:
  • xlabel (str) – Name of one supernode

  • ylabel (str) – Name of the other supernode

Returns:

score – Value of the cosine similarity

Return type:

float

deactivate()

Removes all snodes and sedges from the dictionaries of active snodes and sedges. This removes them from memory, but not from the supergraph structure. It is used to clean memory space

deactivate_sedge(label)

Remove sedge from the dictionary of sedge. Note that this does not supresses the sedge from the supergraph. It is only removed from memory.

Parameters:

label (str) – Name of the sedge

deactivate_snode(label)

Remove snode from the dictionary of snodes. Note that this does not supresses the snode from the supergraph. It is only removed from memory.

Parameters:

label (str) – Name of the snode

detectCommunities(label, alg='louvain', ncmax=None, comm_label='Comm')

Applies the selected community detection algorithm to a given node

Parameters:
  • label (str) – Name of the snode

  • alg (str, opional (default=’louvain’)) – Community detection algorithm

  • ncmax (int or None, optional (default=None)) – Number of communities.

  • label (str, optional (default=’Comm’)) – Label for the cluster indices in the output dataframe

disambiguate_node(node_name)

Disambiguate a given node (from any graph) based on the topological structure of the related snode and sedge in the supergraph

Parameters:
  • path (str) – Path to snode

  • node_name (str) – Name of the node

drop_sedge(label)

Removes sedges from the supergraph

Parameters:

label (str) – Name of the sedge to be removed

drop_snode(label)

Removes snode from the supergraph. Note that this does not remove the related sedges, and the resulting supergraphg might be inconsistent. Future version of this method should accomplish edge removal.

Parameters:

label (str) – Name of the snode to be removed

duplicate_snode(xlabel, ylabel, out_path=None)

Creates a copy of a given snode with another name.

Parameters:
  • xlabel (str) – Name of the snode to be duplicated

  • ylabel (str) – Name of the new snode

  • out_path (str or None, optional (default=None)) – Output path of the duplicate

export_2_halo(e_label, s_att1, s_att2, t_att, t_att2=None)

Export sedge, with selected attributes, into a csv file, for visualization with Halo.

Parameters:
  • path2sedge (str) – Path to the bipartite graph

  • s_att1 (str) – Name of the first attribute of the source node

  • s_att2 (str) – Name of the second attribute of the source node

  • t_att (str) – Name of the attribute of the target node

  • t_att2 (str or None, optional (default=None)) – Name of the second attribute of the target node If None, t_att2 is taken equal to t_att

Returns:

label_map – Dictionary of correspondences label_in_graph : label_in_halo

Return type:

dict

filter_edges_from_sedge(label, th)

Removes edges below a given threshold from a given snode

Parameters:
  • label (str) – Name of the snode

  • th (int or float) – Threshold

filter_edges_from_snode(label, th)

Removes edges below a given threshold from a given snode

Parameters:
  • label (str) – Name of the snode

  • th (int or float) – Threshold

get_attributes(label, is_snode_name=True)

Returns the attributes of a given snode or sedge

Parameters:
  • label (str) – Name of the snode or sedge

  • is_node (bool, optional (default=True)) – If True, label is a snode. If False, label is a sedge

Returns:

atts – List of attributes of the given snode

Return type:

list of str

Notes

If the snode is active, the attributes are not read from file, but from memory. Thus, if any other method has modified the attributes without updating in-memory data, the attribute list might be not updated.

get_metadata(label, is_node_name=True)

Returns the metadata of a given snode or sedge

Parameters:
  • label (str) – Name of the snode or sedge

  • is_node (bool, optional (default=True)) – If True, label is a snode. If False, label is a sedge

Returns:

md – Metadata dictionary.

Return type:

dict

get_sedges()

Returns the label of all sedges in the supergraph

get_snodes()

Returns the label of all snodes in the supergraph

get_terminals(e_label)

Returns the name of the source and target snodes of a given sedge

Parameters:

e_label (str) – Name of the sedge

Returns:

  • s_label (str) – Name of the source snode

  • t_label (str) – Name of the target snode

graph_layout(snode_label, attribute, gravity=1)

Compute the layout of the given graph

Parameters:
  • snode_label (str) – Name of the snode

  • gravity (int, optional (default=1)) – Gravity parameter of the graph layout method (only for force atlas 2)

  • attribute (str) – Snode attribute used to color the graph

is_active_sedge(e_label)

Checks if a given s_edge is active

Parameters:

e_label (str) – Name of the sedge

Returns:

b – True if sedge is active, False otherwise

Return type:

boolean

is_active_snode(label)

Checks if the snode given by label is active

Parameters:

label (str) – Name of the snode

Returns:

b – True if snode is active, False otherwise

Return type:

boolean

is_sedge(e_label)

Checks if the sedge given by e_label exists in the supergraph

Parameters:

e_label (str) – Name of the sedge

Returns:

b – True if sedge exists, False otherwise

Return type:

boolean

is_snode(label)

Checks if the snode given by label exists in the supergraph

Parameters:

label (str) – Name of the snode

Returns:

b – True if snode exists, False otherwise

Return type:

boolean

local_snode_analysis(label, parameter)

Compute local features of nodes in the given snode

Parameters:
  • label (str) – Name of the snode

  • parameter (str) – Name of the local feature

makeSuperNode(label, out_path=None, nodes=None, T=None, attributes={}, edge_class='undirected', save_T=False)

Make a new snode for the supergraph structure. The snode is created as an object (graph) from class DataGraph, with the input data in the args.

Parameters:
  • label (str) – Name os the supernode

  • out_path (str or None, optional (default=None)) – Output path

  • nodes (list or None, optional (default=None)) – List of nodes

  • T (array or None, optional (default=None)) – Feature matrix, one row per node

  • attributes (dict, optional (default={})) – Attributes of the supernode. Note that these are not attributes of the nodes, but of the whole supernode, that will be stored in the snode metagraph

  • save_T (bool, optional (default=False)) – If True, the feature matrix T is saved into an npz file.

remove_isolated_nodes(label)

Removes all isolated nodes in a given snode

Parameters:

label (str) – Name of the snode

remove_snode_attributes(label, att_names)
Parameters:
  • label (str) – Graph to add the new attribute

  • att_names (str or list) – Name or names of the attributes to remove

save_supergraph()

Saves all active snodes and sedges. This means that it will save all snodes and sedges that have been uploaded to self.snodes and self.sedges

snode_from_atts(source, attrib, target=None, path_snode=None, path_sedge=None, e_label=None, att_size=True)

Generate a new snode and a new sedge from a given snode in the supergraph and one of its attributes.

The nodes of the new snode will consist of the attribute values of the snode.

Each node in the source snode will be connected to the node in the target snode containing its attribute value.

Parameters:
  • source (str) – Name of the source snode in the supergraph

  • attrib (str) – The attribute in snode containing the target nodes

  • target (str or None, optional (default=None)) – Name of the target node

  • path_snode (str or None, optional (default=None)) – Output path to save the target snode

  • path_sedge (str or None, optional (default=None)) – Output path to save the sedge

  • e_label (str or None, optional (default=None)) – Name of the new s_edge

  • att_size (bool, optional (defautl=False)) – If True, adds attribute to the target node containing the size of the node measued by the number of neighbors in the sedge

snode_from_edges(source, edges, target=None, path_snode=None, path_sedge=None, e_label=None)

Generate a new snode and a new sedge from a given snode in the supergraph and and a list of edges to the new snode

This method is similar to snode_from_atts. The difference is that snode_from_atts takes the edges from an snode attribute, while snode_from_edges takes the edges as an input argument.

Parameters:
  • source (str) – Name of the source snode in the supergraph

  • edges (list) – List of edges

  • target (str or None) – Name of the target node. If None, default name A_{source} is usedm where {source} is the source name

  • path_snode (str or None) – Output path to save the target snode. If None a defautl path is used

  • path_sedge (str or None) – Output path to save the sedge. If None, a default path is used

  • e_label (str or None) – Name of the sedge connecting the source and target snodes. If None, a default name {source}_2_{target} is used

snode_from_eqs(source, target=None, path_snode=None, path_sedge=None, e_label=None)

Generate a new snode and a new sedge from a given snode in the supergraph.

The nodes of the new snode will consist of the equivalence classes of the snode.

An equivalence class is the set of all nodes fully connected by links with unit weight

All nodes from the same equivalence class at the source snode will be connected to the same equivalent-class node in the target snode

Parameters:
  • source (str) – Name of the source snode in the supergraph

  • target (str or None) – Name of the target node. If None, default name eq_{source} is usedm where {source} is the source name

  • path_snode (str or None) – Output path to save the target snode. If None a defautl path is used

  • path_sedge (str or None) – Output path to save the sedge. If None, a default path is used

  • e_label (str or None) – Name of the sedge connecting the source and target snodes. If None, a default name {source}_2_{target} is used

sub_snode(xlabel, ynodes, ylabel=None, sampleT=True, save_T=True)

Subsample snode X using a given subset of nodes.

The list of nodes may contain nodes that are not in X. These nodes will be included in the new graph, with no edges.

Parameters:
  • xlabel (str) – Name of the snode to be sampled

  • ynodes (int or list) – If list, list of nodes of the output subgraph. If int, number of nodes to sample. The list of nodes is taken at random without replacement from the graph nodes

  • ylabel (str or None, optional (default=None)) – Name of the new snode. If None, the sampled snode replaces the original one

  • sampleT (bool, optional (defaul=True)) – If True, the feature matrix is also sampled, if it exists.

  • save_T (bool, optional (default=True)) – If True, the feature matrix T is saved into an npz file.

sub_snode_by_novalue(xlabel, att, value, ylabel=None, sampleT=False)

Subsample snode by removing all nodes without a given value of the given attribute

Parameters:
  • xlabel (str) – Name of the snode to be sampled

  • att (str) – Name of the attribute to select nodes by value

  • value (int or str, optional) – Value of the attribute. Only nodes NOT taking this value will be selected

  • ylabel (str or None, optional (default=None)) – Name of the new snode. If None, the sampled snode replaces the original one

  • sampleT (bool, optional (defaul=False)) – If True, the feature matrix is also sampled, if it exists.

sub_snode_by_threshold(xlabel, att, th, bound='lower', ylabel=None)

Subsample snode by the removing all nodes whose value of a given attribute is below or above a given threshold

Parameters:
  • xlabel (str) – Name of the snode to be sampled

  • att (str) – Name of the attribute to select nodes by value

  • th (int or float) – Value of the attribute. Only nodes taking this value will be selected

  • bound (str {‘lower’, ‘upper’}, optional (default=’lower’)) – States if the threshold is a lower (default) or an upper bound. If “lower”, all nodes with attribute less than the bound are removed

  • ylabel (str or None, optional (default=None)) – Name of the new snode. If None, the sampled snode replaces the original one

  • sampleT (bool, optional (defaul=False)) – If True, the feature matrix is also sampled, if it exists.

sub_snode_by_value(xlabel, att, value, ylabel=None)

Subsample snode by the value of a single attribute

Parameters:
  • xlabel (str) – Name of the snode to be sampled

  • att (str) – Name of the attribute to select nodes by value

  • value (int or str or list, optional) – Value of the attribute. Only nodes taking this value will be selected. If value is a list, all nodes taking a value in the list are selected.

  • ylabel (str or None, optional (default=None)) – Name of the new snode. If None, the sampled snode replaces the original one

Notes

Note that the feature matrix, if it exists, is not sampled.

transduce(xylabel, n=1, normalize=True, keep_active=None)

Given snode X and sedge X-Y, compute a graph for Y based on the connectivity betwen nodes in Y through edges from Y to X (strored in X-Y) and edges in X.

Parameters:
  • xylabel (str) – Name of the sedge (bipartite graph) X-Y. The names of snodes X and Y will be taken from the metadata of the bipartite graph

  • n (int, optional (default=1)) – Order for K. A positive integer. The affinity matrix is a normalized version of F·K^n·F’

  • normalize (bool, optional (default=True)) – If True the graph is normalized so that each node has similarity 1 to itself.

  • keep_active (bool, optinoal (default=False)) – If True, snodes and sedge are not deactivated before return. If False, X and X-Y are deactivated. Y remains active, otherwise changes would be lost. If None, the defaul value in self.keep_active is used

Notes

The new graph is stored in the edges of snode Y.

transitive_graph(e_label, xmlabel, mylabel, path_sedge=None, keep_active=None)

Construct a new superedge XY connecting suprenodes X and Y that are linked by an intermediate supernode M (through superedges XM and MY)

To do so, we replace connections x-m-y by connections x-y

Parameters:
  • e_label (str) – Label of the new superedge

  • xmlabel (str) – Label of superedge x-m

  • mylabel (str) – Label of superedge m-y

  • path_sedge (str) – Path to the ne sedge

  • keep_active (bool, optinoal (default=False)) – If True, snodes and sedge are not deactivated before return. If False, XM and MY are deactivated. XY remains active, otherwise changes would be lost. If None, the defaul value in self.keep_active is used

Data graph view

rdigraphs.supergraph.dgview.plotAverageTopic(X, fpath=None)

Plots the sorted average of all sorted topics vectors

Parameters:
  • X (array) – Input matrix

  • fpath (str or None, optional (default=None)) – Path to save the figure

rdigraphs.supergraph.dgview.plotCXmatrix(M, fpath=None)

Plots matrix

Parameters:
  • M (array) – Input matrix

  • fpath (str or None, optional (default=None)) – Path to save the figure

rdigraphs.supergraph.dgview.plotClusterWeights(labels, fpath=None, n=None)

Barplot of the nuber of items per cluster.

Plots the main topic

Parameters:
  • labels (list) – Labels

  • fpath (str or None, optional (default=None)) – Path to save the figure

  • n (int or None, optional (default=None)) – If none, show all clusters. If integer, it only shows the highest n clusters

rdigraphs.supergraph.dgview.plotMainTopic(X, fpath=None)

Plots the main topic

Parameters:
  • X (array) – Input matrix

  • fpath (str or None, optional (default=None)) – Path to save the figure

rdigraphs.supergraph.dgview.plotSortedFeatures(X, fpath=None)

Plots the average of all sorted topics vectors

Parameters:
  • X (array) – Input matrix

  • fpath (str or None, optional (default=None)) – Path to save the figure

rdigraphs.supergraph.dgview.plot_cluster_analysis(scores, fpath)

Plots the values of one or more scores for clustering evaluation, as a function of the number of clusters.

Parameters:
  • scores (dict) – Scores

  • fpath (str) – Path to save the figure

rdigraphs.supergraph.dgview.printClusters(M)

Prints clusters

Parameters:

M (array) – Input matrix

rdigraphs.supergraph.dgview.printStats(df_nodes, label)

logging.info(some statistics about the size and content of the graph data

Parameters:
  • df_nodes (dataframe) – Dataframe

  • label (str) – Label

rdigraphs.supergraph.dgview.rankEdges(df_edges, df_nodes, fields, n)

” Show the n edges with highest weight.

Refer the nodes using the specified field in df_nodes

Parameters:
  • df_edges (dataframe) – Edges

  • df_nodes (dataframe) – Nodes

  • fields (str) – Column containing the weights

  • n (int) – Number of edges

Super edge

class rdigraphs.supergraph.sedge.SEdge(label='dg', path=None, label_source=None, label_target=None, load_data=True, edge_class='directed')

Bases: DataGraph

Generic class defining a super-edge: a bipartite datagraph.

It is inherited from DataGraph. This is because a bipartite graph is nothing but a particular type of graph.

The SEdge class distinguishes between source nodes and target nodes. Thus, the DataGraph class is extended with some attributes to label the type of each node.

However, the graph may be undirected (links from target nodes to source nodes are allowed)

__init__(label='dg', path=None, label_source=None, label_target=None, load_data=True, edge_class='directed')

Defines the superedge structure

If a superedge exists in the given path, it is loaded. Otherwise, a new empty superegde is created.

Parameters:
  • label (str or None, optional (default=’dg’)) – Name os the superedge

  • path (str or None, optional (default=None)) – Path to the folder that contains, or will contain, the graph data

  • label_source (str or None, optional (default=None)) – Generic name of the source nodes

  • label_target (str or None, optional (default=None)) – Generic name of the target nodes

  • load_data (bool, optional (default=True)) – If True (default) the graph data are loaded. If False, only the graph metadata are loaded

Variables:
  • n_source (int) – Number of source nodes

  • n_target (int) – Number of target nodes

  • label_source (str) – Generic label for the source nodes

  • label_target (str) – Generic label for the target nodes

Notes

These are the specific attributes of the SEdge class. See the parent class documentation to see more attributes.

The source and target nodes in the parent class attribute self.df_edges are located in columns ‘Source’ and ‘Target’, because these are the standard names for Gephi graphs. Thus, the names in label_source and label_target are not used in self.df_edges.

add_single_edge(source, target, weight=1, attributes={})

Add single edge

Parameters:
  • source (str) – Source node name

  • target (str) – Target node name

  • weight (float, optional (default=1)) – Edge weight

  • attributes (dict, optional (default={})) – Dictionary of attributes

add_single_node(node, attributes={})

Add single node

Parameters:
  • node (str) – Node name

  • attributes (dict, optional (default={})) – Dictionary of attributes

computeSimBiGraph(s_min=None, n_gnodesS=None, n_gnodesT=None, n_edges=None, similarity='He2', g=1, blocksize=25000, useGPU=False, verbose=True)

Computes a sparse similarity bipartite graph for the self graph structure. The self graph must contain a T-matrix, self.T

Parameters:
  • s_min (float or None, optional (default=None)) – Similarity threshold. Edges link all data pairs with similarity higher than R. This forzes a sparse graph.

  • n_gnodesS (int or None, optional (default=None)) – Number of nodes in the source subgraph. If None, all nodes are used If n_gnodesS < no. of rows in self.Xs, a random subsample is taken.

  • n_gnodesT (int or None, optional (default=None)) – Number of nodes in the target subgraph. If None, all nodes are used. If n_gnodesT < no. of rows in self.Xt, a random subsample is taken.

  • n_edges (int or None, optional (default=None)) – Target number of edges. n_edges is an alternative to radius. Only one of both must be specified (i.e., not None)

  • similarity (str {‘He2’, ‘He2->JS’}, optional (default=’He2’)) – Similarity measure used to compute affinity matrix Available options are: ‘He2’ (1 minus squared Hellinger distance (self implementation)); ‘He2->JS’ (1 minus Jensen-Shannon (JS) divergence)

  • g (float, optional (default=1)) – Exponent for the affinity mapping

  • blocksize (int, optional (default=25_000)) – Size of each block for the computation of affinity values. Large sizes might imply a large memory consumption.

  • useGPU (bool, optional (default=False)) – If True, matrix operations are accelerated using GPU

  • verbose (bool, optional (default=True)) – (Only for he_neighbors_graph()). If False, block-by-block messaging is omitted

disconnect_nodes(source, target, directed=False)

Disconnect nodes by removing edges

Parameters:
  • source (str) – Source node name

  • target (str) – Target node name

  • directed (bool, optional (default=True)) – True if only edge source->target should be removed

drop_single_node(node)

Add single node

Parameters:

node (str) – Node name

get_source_nodes()

Get list of source nodes

Return type:

list of source nodes

get_target_nodes()

Get list of target nodes

Return type:

list of target nodes

get_terminals()

Returns the name of the source and target snodes of a given sedge

Returns:

  • s_label (str) – Name of the source snode

  • t_label (str) – Name of the target snode

save_feature_matrix()

Save feature matrices in self.Xs and self.Xt, if they exist.

set_edges(source_nodes, target_nodes, weights=None)

This method modifies set_edges from the parent class to test name collisions in source and target nodes.

Parameters:
  • source_nodes (list) – Source nodes

  • target_nodes (list) – Target nodes

  • weights (list or None, optional (default=None)) – Edge weights. If None, unit weights are assumed

set_nodes(nodes_orig=[], nodes_dest=[], Xs=None, Xt=None, save_T=False)

Loads a superedge with a given set of source and target nodes.

The new sets of nodes overwrite any existing ones.

Parameters:
  • nodes_orig (list, optional (default=[])) – Source nodes

  • nodes_dest (list, optional (default=[])) – Target nodes

  • Xs (array or None, optional (default=None)) – Source feature matrix: one row per source node, one column per feature

  • Xt (array or None, optional (default=None)) – Target feature matrix: one row per source node, one column per feature

  • save_T (bool, optional (default=False)) – If True, features matrices are saver into npz files.

update_metadata()

Updates metadata dictionary with the self variables directly computed from df_nodes and df_edges