Supergraph
- class rdigraphs.supergraph.supergraph.SuperGraph(snode=None, path=None, path2snodes=None, path2sedges=None, label='sg', keep_active=False)
Bases:
object
Generic class defining a supergraph.
A supergraph is a set of supernodes and super-edges connecting them.
A supernode is itself a graph.
A superedge connecting supernodes M and N is a directed bipartite graph whose nodes are the nodes of M and N, and edges are directed links from nodes in M to nodes in N.
The class provides methods to create a supergraph and add nodes and edges using data taken from an SQL database.
- __init__(snode=None, path=None, path2snodes=None, path2sedges=None, label='sg', keep_active=False)
Initializes an empty supergraph with an empty dictionary of supernodes and an empty dictionary of superlinks
- Parameters:
snode (str or None, optional (default=None)) – If None, an empty supergraph is created. Otherwise, the given snode is added to the supergraph structure
path (str or None, optional (default=None)) – Path to the supergraph folder
path2snodes (str or None, optional (default=None)) – Path to the snodes folder. If None, snodes will be located in folder ‘snodes’ from the input path
path2sedges (str or None, optional (default=None)) – Path to the sedges folder. If None, sedges will be located in folder ‘sedges’ from the input path
label (str, optional (default=’sg’)) – Name of the supegraph
keep_active (bool, optional (default=False)) – If True, all supergraph components loaded from memory or generated by some method are preserved in memory. If False, some methods may deactivate snodes or sedges to free some memory. This is useful for large graphs
- __weakref__
list of weak references to the object (if defined)
- activate_all()
Activates all snodes and sedges in the supergraph
- activate_sedge(label)
Loads sedge into the dictionary of active sedges.
- Parameters:
label (str) – Name of the sedge
- activate_snode(label)
Loads snode into the dictionary of active snodes
- Parameters:
label (str) – Name of the snode
- addSuperEdge(sedge, weight=1, attributes={})
Add a superedge (sedge) to the supergraph structure. The sedge is an object (graph) from class SEdge, which is an extension of class DataGraph.
- Parameters:
sedge (object) – Superedge (an object of class SEdge)
weight (float, optional (default=1)) – Weight of the sedge (in the supergraph)
attributes (dict, optional (default={})) – A dictionary of attributes
- addSuperNode(snode, attributes={})
Add a snode to the supergraph structure. The node is an object (graph) from class DataGraph.
- Parameters:
snode (object) – snode (an instance of class DataGraph)
attributes (dict, optional (default={})) – Dictionary of snode attributes
- add_snode_attributes(label, att, att_values)
- Parameters:
label (str) – Graph to add the new attribute
att (str or list) – Name of the attribute If att_values is a pandas dataframe, ‘att’ contains the name of the column in att_values that will be used as key for merging
values (list or pandas dataframe or dict) – If list: it contains the attribute values. If names is list, it is a list of lists. The order of the values must correspond with the order of nodes in the self node. If pandas dataframe, dataframe containing the new attribute values. This dataframe will be merged with the dataframe of nodes. In such case, name If dict, the keys must refer to values in the reference column of the snode.
att_values (dataframe containing the attribute values)
- clean_up_metagraph()
Removes from the metagraph all nodes or edges that do no longer exist in the supergraph database.
- computeSimBiGraph(s_label, t_label, e_label=None, s_min=None, n_edges=None, n_gnodesS=None, n_gnodesT=None, similarity='He2', g=1, blocksize=25000, useGPU=False, tmp_folder=None, save_every=1e+300, verbose=True)
Calls the snode method to compute a similarity bipartite graph.
It assumes that the source and target snodes with their corresponding feature matrices already exists in the supergraphs structure.
A new sedge connecting the source and target snodes will be created. If it already exists
- Parameters:
s_label (str) – Name of the source snode
t_label (str) – Name of the target snode
e_label (str or None, optional (default=None)) – Name of the new s_edge
s_min (float or None, optional (default=None)) – Similarity threshold. Edges link all data pairs with similarity higher than R. This forzes a sparse graph.
n_edges (int or None, optional (default=None)) – Target number of edges. n_edges is an alternative to radius. Only one of both must be specified
n_gnodesS (int or None, optional (default=None)) – Number of nodes in the source subgraph. If None, all nodes are used If n_gnodesS < no. of rows in self.Xs, a random subsample is taken.
n_gnodesT (int or None, optional (default=None)) – Number of nodes in the target subgraph. If None, all nodes are used. If n_gnodesT < no. of rows in self.Xt, a random subsample is taken.
similarity (str {‘He2’, ‘He2->JS’}, optional (default=’JS’)) – Similarity measure used to compute affinity matrix. Available options are: (1) ‘He2’ (Same as He, but based on a proper implementation); (2) ‘He2->JS’: 1 minus Jensen-Shannon (JS) divergence
g (int or float, optional (default=1)) – Exponent for the affinity mapping
blocksize (int, optional (default=25_000)) – Size of each block for affinity computations
useGPU (bool, optional (default=False)) – If True, matrix operations are accelerated using GPU
tmp_folder (str or None, optional (defautl = None)) – Name of the folder to save temporary files
save_every (int, optional (default=0)) – Maximum size of the growing lists. The output lists are constructed incrementally. To avooy memory overload, growing lists are saved every time they reach this size limit. The full liests are thus incrementally saved in files. The default value is extremely large, which de facto implies no temporary saving.
verbose (bool, optional (default=True)) – If False, block-by-block messaging is omitted
- computeSimGraph(label, s_min=None, n_edges=None, n_gnodes=None, similarity='He2', g=1, blocksize=25000, useGPU=False, tmp_folder=None, save_every=1e+300, verbose=True)
Calls the snode method to compute a similarity graph
- Parameters:
label (str) – Name of the snode where the similarity graph will be computed.
s_min (float or None, optional (default=None)) – Similarity threshold. Edges link all data pairs with similarity higher than R. This forzes a sparse graph.
n_edges (int or None, optional (default=None)) – Target number of edges. n_edges is an alternative to radius. Only one of both must be specified
n_gnodes (int or None, optional (default=None)) – Number of nodes in the source subgraph. If None, all nodes are used If n_gnodes < no. of rows in self.T, a random subsample is taken.
similarity (str, optional (default=’JS’)) – Similarity measure used to compute affinity matrix. Available options are: (1) ‘JS’ (1 minus Jensen-Shannon (JS) divergence (too slow)); (2) ‘l1’ (1 minus l1 distance); (3) ‘He’ (1 minus squared Hellinger’s distance (sklean-based)); (4) ‘He2’ (Same as He, but based on a proper implementation); (5) ‘Gauss’ (An exponential function of the squared l2 distance); (6) ‘l1->JS’ (Same as JS, but the graph is computed after preselecting edges using l1 distances and a theoretical bound); (7) ‘He->JS’ (Same as JS, but the graph is computed after pre- selecting edges using Hellinger’s distances and a theoretical bound (8) ‘He2->JS’:Same as He-Js, but using a self implementation of He (9) ‘cosine’, cosine similarity
g (int or float, optional (default=1)) – Exponent for the affinity mapping
blocksize (int, optional (default=25_000)) – Size of each block for affinity computations
useGPU (bool, optional (default=False)) – If True, matrix operations are accelerated using GPU
tmp_folder (str or None, optional (defautl = None)) – Name of the folder to save temporary files
save_every (int, optional (default=0)) – Maximum size of the growing lists. The output lists are constructed incrementally. To avooy memory overload, growing lists are saved every time they reach this size limit. The full liests are thus incrementally saved in files. The default value is extremely large, which de facto implies no temporary saving.
verbose (bool, optional (default=True)) – (Only for he_neighbors_graph()). If False, block-by-block messaging is omitted
- compute_ppr(s_label, t_label=None, th=0.9, inplace=False)
Calls the snode method to compute a similarity graph
- Parameters:
label (str) – Name of the snode where the similarity graph will be computed.
th (float, optional (default=0.9)) – Threshold over the ppr to create a link
inplace (bool, optional (default=True)) – If true, the new graph overrides the original graph
- cosine_sim(xlabel, ylabel)
Computes the cosine similarity between two supernodes, X and Y. The cosine similarity isdefined as follows:
sim = trace(X’ Y) / ||X|| ||Y||
where ||·|| is the Frobenius norm
- Parameters:
xlabel (str) – Name of one supernode
ylabel (str) – Name of the other supernode
- Returns:
score – Value of the cosine similarity
- Return type:
float
- deactivate()
Removes all snodes and sedges from the dictionaries of active snodes and sedges. This removes them from memory, but not from the supergraph structure. It is used to clean memory space
- deactivate_sedge(label)
Remove sedge from the dictionary of sedge. Note that this does not supresses the sedge from the supergraph. It is only removed from memory.
- Parameters:
label (str) – Name of the sedge
- deactivate_snode(label)
Remove snode from the dictionary of snodes. Note that this does not supresses the snode from the supergraph. It is only removed from memory.
- Parameters:
label (str) – Name of the snode
- detectCommunities(label, alg='louvain', ncmax=None, comm_label='Comm')
Applies the selected community detection algorithm to a given node
- Parameters:
label (str) – Name of the snode
alg (str, opional (default=’louvain’)) – Community detection algorithm
ncmax (int or None, optional (default=None)) – Number of communities.
label (str, optional (default=’Comm’)) – Label for the cluster indices in the output dataframe
- disambiguate_node(node_name)
Disambiguate a given node (from any graph) based on the topological structure of the related snode and sedge in the supergraph
- Parameters:
path (str) – Path to snode
node_name (str) – Name of the node
- drop_sedge(label)
Removes sedges from the supergraph
- Parameters:
label (str) – Name of the sedge to be removed
- drop_snode(label)
Removes snode from the supergraph. Note that this does not remove the related sedges, and the resulting supergraphg might be inconsistent. Future version of this method should accomplish edge removal.
- Parameters:
label (str) – Name of the snode to be removed
- duplicate_snode(xlabel, ylabel, out_path=None)
Creates a copy of a given snode with another name.
- Parameters:
xlabel (str) – Name of the snode to be duplicated
ylabel (str) – Name of the new snode
out_path (str or None, optional (default=None)) – Output path of the duplicate
- export_2_halo(e_label, s_att1, s_att2, t_att, t_att2=None)
Export sedge, with selected attributes, into a csv file, for visualization with Halo.
- Parameters:
path2sedge (str) – Path to the bipartite graph
s_att1 (str) – Name of the first attribute of the source node
s_att2 (str) – Name of the second attribute of the source node
t_att (str) – Name of the attribute of the target node
t_att2 (str or None, optional (default=None)) – Name of the second attribute of the target node If None, t_att2 is taken equal to t_att
- Returns:
label_map – Dictionary of correspondences label_in_graph : label_in_halo
- Return type:
dict
- filter_edges_from_sedge(label, th)
Removes edges below a given threshold from a given snode
- Parameters:
label (str) – Name of the snode
th (int or float) – Threshold
- filter_edges_from_snode(label, th)
Removes edges below a given threshold from a given snode
- Parameters:
label (str) – Name of the snode
th (int or float) – Threshold
- get_attributes(label, is_snode_name=True)
Returns the attributes of a given snode or sedge
- Parameters:
label (str) – Name of the snode or sedge
is_node (bool, optional (default=True)) – If True, label is a snode. If False, label is a sedge
- Returns:
atts – List of attributes of the given snode
- Return type:
list of str
Notes
If the snode is active, the attributes are not read from file, but from memory. Thus, if any other method has modified the attributes without updating in-memory data, the attribute list might be not updated.
- get_metadata(label, is_node_name=True)
Returns the metadata of a given snode or sedge
- Parameters:
label (str) – Name of the snode or sedge
is_node (bool, optional (default=True)) – If True, label is a snode. If False, label is a sedge
- Returns:
md – Metadata dictionary.
- Return type:
dict
- get_sedges()
Returns the label of all sedges in the supergraph
- get_snodes()
Returns the label of all snodes in the supergraph
- get_terminals(e_label)
Returns the name of the source and target snodes of a given sedge
- Parameters:
e_label (str) – Name of the sedge
- Returns:
s_label (str) – Name of the source snode
t_label (str) – Name of the target snode
- graph_layout(snode_label, attribute, gravity=1)
Compute the layout of the given graph
- Parameters:
snode_label (str) – Name of the snode
gravity (int, optional (default=1)) – Gravity parameter of the graph layout method (only for force atlas 2)
attribute (str) – Snode attribute used to color the graph
- is_active_sedge(e_label)
Checks if a given s_edge is active
- Parameters:
e_label (str) – Name of the sedge
- Returns:
b – True if sedge is active, False otherwise
- Return type:
boolean
- is_active_snode(label)
Checks if the snode given by label is active
- Parameters:
label (str) – Name of the snode
- Returns:
b – True if snode is active, False otherwise
- Return type:
boolean
- is_sedge(e_label)
Checks if the sedge given by e_label exists in the supergraph
- Parameters:
e_label (str) – Name of the sedge
- Returns:
b – True if sedge exists, False otherwise
- Return type:
boolean
- is_snode(label)
Checks if the snode given by label exists in the supergraph
- Parameters:
label (str) – Name of the snode
- Returns:
b – True if snode exists, False otherwise
- Return type:
boolean
- local_snode_analysis(label, parameter)
Compute local features of nodes in the given snode
- Parameters:
label (str) – Name of the snode
parameter (str) – Name of the local feature
- makeSuperNode(label, out_path=None, nodes=None, T=None, attributes={}, edge_class='undirected', save_T=False)
Make a new snode for the supergraph structure. The snode is created as an object (graph) from class DataGraph, with the input data in the args.
- Parameters:
label (str) – Name os the supernode
out_path (str or None, optional (default=None)) – Output path
nodes (list or None, optional (default=None)) – List of nodes
T (array or None, optional (default=None)) – Feature matrix, one row per node
attributes (dict, optional (default={})) – Attributes of the supernode. Note that these are not attributes of the nodes, but of the whole supernode, that will be stored in the snode metagraph
save_T (bool, optional (default=False)) – If True, the feature matrix T is saved into an npz file.
- remove_isolated_nodes(label)
Removes all isolated nodes in a given snode
- Parameters:
label (str) – Name of the snode
- remove_snode_attributes(label, att_names)
- Parameters:
label (str) – Graph to add the new attribute
att_names (str or list) – Name or names of the attributes to remove
- save_supergraph()
Saves all active snodes and sedges. This means that it will save all snodes and sedges that have been uploaded to self.snodes and self.sedges
- snode_from_atts(source, attrib, target=None, path_snode=None, path_sedge=None, e_label=None, att_size=True)
Generate a new snode and a new sedge from a given snode in the supergraph and one of its attributes.
The nodes of the new snode will consist of the attribute values of the snode.
Each node in the source snode will be connected to the node in the target snode containing its attribute value.
- Parameters:
source (str) – Name of the source snode in the supergraph
attrib (str) – The attribute in snode containing the target nodes
target (str or None, optional (default=None)) – Name of the target node
path_snode (str or None, optional (default=None)) – Output path to save the target snode
path_sedge (str or None, optional (default=None)) – Output path to save the sedge
e_label (str or None, optional (default=None)) – Name of the new s_edge
att_size (bool, optional (defautl=False)) – If True, adds attribute to the target node containing the size of the node measued by the number of neighbors in the sedge
- snode_from_edges(source, edges, target=None, path_snode=None, path_sedge=None, e_label=None)
Generate a new snode and a new sedge from a given snode in the supergraph and and a list of edges to the new snode
This method is similar to snode_from_atts. The difference is that snode_from_atts takes the edges from an snode attribute, while snode_from_edges takes the edges as an input argument.
- Parameters:
source (str) – Name of the source snode in the supergraph
edges (list) – List of edges
target (str or None) – Name of the target node. If None, default name A_{source} is usedm where {source} is the source name
path_snode (str or None) – Output path to save the target snode. If None a defautl path is used
path_sedge (str or None) – Output path to save the sedge. If None, a default path is used
e_label (str or None) – Name of the sedge connecting the source and target snodes. If None, a default name {source}_2_{target} is used
- snode_from_eqs(source, target=None, path_snode=None, path_sedge=None, e_label=None)
Generate a new snode and a new sedge from a given snode in the supergraph.
The nodes of the new snode will consist of the equivalence classes of the snode.
An equivalence class is the set of all nodes fully connected by links with unit weight
All nodes from the same equivalence class at the source snode will be connected to the same equivalent-class node in the target snode
- Parameters:
source (str) – Name of the source snode in the supergraph
target (str or None) – Name of the target node. If None, default name eq_{source} is usedm where {source} is the source name
path_snode (str or None) – Output path to save the target snode. If None a defautl path is used
path_sedge (str or None) – Output path to save the sedge. If None, a default path is used
e_label (str or None) – Name of the sedge connecting the source and target snodes. If None, a default name {source}_2_{target} is used
- sub_snode(xlabel, ynodes, ylabel=None, sampleT=True, save_T=True)
Subsample snode X using a given subset of nodes.
The list of nodes may contain nodes that are not in X. These nodes will be included in the new graph, with no edges.
- Parameters:
xlabel (str) – Name of the snode to be sampled
ynodes (int or list) – If list, list of nodes of the output subgraph. If int, number of nodes to sample. The list of nodes is taken at random without replacement from the graph nodes
ylabel (str or None, optional (default=None)) – Name of the new snode. If None, the sampled snode replaces the original one
sampleT (bool, optional (defaul=True)) – If True, the feature matrix is also sampled, if it exists.
save_T (bool, optional (default=True)) – If True, the feature matrix T is saved into an npz file.
- sub_snode_by_novalue(xlabel, att, value, ylabel=None, sampleT=False)
Subsample snode by removing all nodes without a given value of the given attribute
- Parameters:
xlabel (str) – Name of the snode to be sampled
att (str) – Name of the attribute to select nodes by value
value (int or str, optional) – Value of the attribute. Only nodes NOT taking this value will be selected
ylabel (str or None, optional (default=None)) – Name of the new snode. If None, the sampled snode replaces the original one
sampleT (bool, optional (defaul=False)) – If True, the feature matrix is also sampled, if it exists.
- sub_snode_by_threshold(xlabel, att, th, bound='lower', ylabel=None)
Subsample snode by the removing all nodes whose value of a given attribute is below or above a given threshold
- Parameters:
xlabel (str) – Name of the snode to be sampled
att (str) – Name of the attribute to select nodes by value
th (int or float) – Value of the attribute. Only nodes taking this value will be selected
bound (str {‘lower’, ‘upper’}, optional (default=’lower’)) – States if the threshold is a lower (default) or an upper bound. If “lower”, all nodes with attribute less than the bound are removed
ylabel (str or None, optional (default=None)) – Name of the new snode. If None, the sampled snode replaces the original one
sampleT (bool, optional (defaul=False)) – If True, the feature matrix is also sampled, if it exists.
- sub_snode_by_value(xlabel, att, value, ylabel=None)
Subsample snode by the value of a single attribute
- Parameters:
xlabel (str) – Name of the snode to be sampled
att (str) – Name of the attribute to select nodes by value
value (int or str or list, optional) – Value of the attribute. Only nodes taking this value will be selected. If value is a list, all nodes taking a value in the list are selected.
ylabel (str or None, optional (default=None)) – Name of the new snode. If None, the sampled snode replaces the original one
Notes
Note that the feature matrix, if it exists, is not sampled.
- transduce(xylabel, n=1, normalize=True, keep_active=None)
Given snode X and sedge X-Y, compute a graph for Y based on the connectivity betwen nodes in Y through edges from Y to X (strored in X-Y) and edges in X.
- Parameters:
xylabel (str) – Name of the sedge (bipartite graph) X-Y. The names of snodes X and Y will be taken from the metadata of the bipartite graph
n (int, optional (default=1)) – Order for K. A positive integer. The affinity matrix is a normalized version of F·K^n·F’
normalize (bool, optional (default=True)) – If True the graph is normalized so that each node has similarity 1 to itself.
keep_active (bool, optinoal (default=False)) – If True, snodes and sedge are not deactivated before return. If False, X and X-Y are deactivated. Y remains active, otherwise changes would be lost. If None, the defaul value in self.keep_active is used
Notes
The new graph is stored in the edges of snode Y.
- transitive_graph(e_label, xmlabel, mylabel, path_sedge=None, keep_active=None)
Construct a new superedge XY connecting suprenodes X and Y that are linked by an intermediate supernode M (through superedges XM and MY)
To do so, we replace connections x-m-y by connections x-y
- Parameters:
e_label (str) – Label of the new superedge
xmlabel (str) – Label of superedge x-m
mylabel (str) – Label of superedge m-y
path_sedge (str) – Path to the ne sedge
keep_active (bool, optinoal (default=False)) – If True, snodes and sedge are not deactivated before return. If False, XM and MY are deactivated. XY remains active, otherwise changes would be lost. If None, the defaul value in self.keep_active is used
Data graph view
- rdigraphs.supergraph.dgview.plotAverageTopic(X, fpath=None)
Plots the sorted average of all sorted topics vectors
- Parameters:
X (array) – Input matrix
fpath (str or None, optional (default=None)) – Path to save the figure
- rdigraphs.supergraph.dgview.plotCXmatrix(M, fpath=None)
Plots matrix
- Parameters:
M (array) – Input matrix
fpath (str or None, optional (default=None)) – Path to save the figure
- rdigraphs.supergraph.dgview.plotClusterWeights(labels, fpath=None, n=None)
Barplot of the nuber of items per cluster.
Plots the main topic
- Parameters:
labels (list) – Labels
fpath (str or None, optional (default=None)) – Path to save the figure
n (int or None, optional (default=None)) – If none, show all clusters. If integer, it only shows the highest n clusters
- rdigraphs.supergraph.dgview.plotMainTopic(X, fpath=None)
Plots the main topic
- Parameters:
X (array) – Input matrix
fpath (str or None, optional (default=None)) – Path to save the figure
- rdigraphs.supergraph.dgview.plotSortedFeatures(X, fpath=None)
Plots the average of all sorted topics vectors
- Parameters:
X (array) – Input matrix
fpath (str or None, optional (default=None)) – Path to save the figure
- rdigraphs.supergraph.dgview.plot_cluster_analysis(scores, fpath)
Plots the values of one or more scores for clustering evaluation, as a function of the number of clusters.
- Parameters:
scores (dict) – Scores
fpath (str) – Path to save the figure
- rdigraphs.supergraph.dgview.printClusters(M)
Prints clusters
- Parameters:
M (array) – Input matrix
- rdigraphs.supergraph.dgview.printStats(df_nodes, label)
logging.info(some statistics about the size and content of the graph data
- Parameters:
df_nodes (dataframe) – Dataframe
label (str) – Label
- rdigraphs.supergraph.dgview.rankEdges(df_edges, df_nodes, fields, n)
” Show the n edges with highest weight.
Refer the nodes using the specified field in df_nodes
- Parameters:
df_edges (dataframe) – Edges
df_nodes (dataframe) – Nodes
fields (str) – Column containing the weights
n (int) – Number of edges
Super edge
- class rdigraphs.supergraph.sedge.SEdge(label='dg', path=None, label_source=None, label_target=None, load_data=True, edge_class='directed')
Bases:
DataGraph
Generic class defining a super-edge: a bipartite datagraph.
It is inherited from DataGraph. This is because a bipartite graph is nothing but a particular type of graph.
The SEdge class distinguishes between source nodes and target nodes. Thus, the DataGraph class is extended with some attributes to label the type of each node.
However, the graph may be undirected (links from target nodes to source nodes are allowed)
- __init__(label='dg', path=None, label_source=None, label_target=None, load_data=True, edge_class='directed')
Defines the superedge structure
If a superedge exists in the given path, it is loaded. Otherwise, a new empty superegde is created.
- Parameters:
label (str or None, optional (default=’dg’)) – Name os the superedge
path (str or None, optional (default=None)) – Path to the folder that contains, or will contain, the graph data
label_source (str or None, optional (default=None)) – Generic name of the source nodes
label_target (str or None, optional (default=None)) – Generic name of the target nodes
load_data (bool, optional (default=True)) – If True (default) the graph data are loaded. If False, only the graph metadata are loaded
- Variables:
n_source (int) – Number of source nodes
n_target (int) – Number of target nodes
label_source (str) – Generic label for the source nodes
label_target (str) – Generic label for the target nodes
Notes
These are the specific attributes of the SEdge class. See the parent class documentation to see more attributes.
The source and target nodes in the parent class attribute self.df_edges are located in columns ‘Source’ and ‘Target’, because these are the standard names for Gephi graphs. Thus, the names in label_source and label_target are not used in self.df_edges.
- add_single_edge(source, target, weight=1, attributes={})
Add single edge
- Parameters:
source (str) – Source node name
target (str) – Target node name
weight (float, optional (default=1)) – Edge weight
attributes (dict, optional (default={})) – Dictionary of attributes
- add_single_node(node, attributes={})
Add single node
- Parameters:
node (str) – Node name
attributes (dict, optional (default={})) – Dictionary of attributes
- computeSimBiGraph(s_min=None, n_gnodesS=None, n_gnodesT=None, n_edges=None, similarity='He2', g=1, blocksize=25000, useGPU=False, verbose=True)
Computes a sparse similarity bipartite graph for the self graph structure. The self graph must contain a T-matrix, self.T
- Parameters:
s_min (float or None, optional (default=None)) – Similarity threshold. Edges link all data pairs with similarity higher than R. This forzes a sparse graph.
n_gnodesS (int or None, optional (default=None)) – Number of nodes in the source subgraph. If None, all nodes are used If n_gnodesS < no. of rows in self.Xs, a random subsample is taken.
n_gnodesT (int or None, optional (default=None)) – Number of nodes in the target subgraph. If None, all nodes are used. If n_gnodesT < no. of rows in self.Xt, a random subsample is taken.
n_edges (int or None, optional (default=None)) – Target number of edges. n_edges is an alternative to radius. Only one of both must be specified (i.e., not None)
similarity (str {‘He2’, ‘He2->JS’}, optional (default=’He2’)) – Similarity measure used to compute affinity matrix Available options are: ‘He2’ (1 minus squared Hellinger distance (self implementation)); ‘He2->JS’ (1 minus Jensen-Shannon (JS) divergence)
g (float, optional (default=1)) – Exponent for the affinity mapping
blocksize (int, optional (default=25_000)) – Size of each block for the computation of affinity values. Large sizes might imply a large memory consumption.
useGPU (bool, optional (default=False)) – If True, matrix operations are accelerated using GPU
verbose (bool, optional (default=True)) – (Only for he_neighbors_graph()). If False, block-by-block messaging is omitted
- disconnect_nodes(source, target, directed=False)
Disconnect nodes by removing edges
- Parameters:
source (str) – Source node name
target (str) – Target node name
directed (bool, optional (default=True)) – True if only edge source->target should be removed
- drop_single_node(node)
Add single node
- Parameters:
node (str) – Node name
- get_source_nodes()
Get list of source nodes
- Return type:
list of source nodes
- get_target_nodes()
Get list of target nodes
- Return type:
list of target nodes
- get_terminals()
Returns the name of the source and target snodes of a given sedge
- Returns:
s_label (str) – Name of the source snode
t_label (str) – Name of the target snode
- save_feature_matrix()
Save feature matrices in self.Xs and self.Xt, if they exist.
- set_edges(source_nodes, target_nodes, weights=None)
This method modifies set_edges from the parent class to test name collisions in source and target nodes.
- Parameters:
source_nodes (list) – Source nodes
target_nodes (list) – Target nodes
weights (list or None, optional (default=None)) – Edge weights. If None, unit weights are assumed
- set_nodes(nodes_orig=[], nodes_dest=[], Xs=None, Xt=None, save_T=False)
Loads a superedge with a given set of source and target nodes.
The new sets of nodes overwrite any existing ones.
- Parameters:
nodes_orig (list, optional (default=[])) – Source nodes
nodes_dest (list, optional (default=[])) – Target nodes
Xs (array or None, optional (default=None)) – Source feature matrix: one row per source node, one column per feature
Xt (array or None, optional (default=None)) – Target feature matrix: one row per source node, one column per feature
save_T (bool, optional (default=False)) – If True, features matrices are saver into npz files.
- update_metadata()
Updates metadata dictionary with the self variables directly computed from df_nodes and df_edges