torch_geometric.data.HeteroData
- class HeteroData(_mapping: Optional[Dict[str, Any]] = None, **kwargs)[source]
Bases:
BaseData,FeatureStore,GraphStoreA data object describing a heterogeneous graph, holding multiple node and/or edge types in disjunct storage objects. Storage objects can hold either node-level, link-level or graph-level attributes. In general,
HeteroDatatries to mimic the behavior of a regular nested Python dictionary. In addition, it provides useful functionality for analyzing graph structures, and provides basic PyTorch tensor functionalities.from torch_geometric.data import HeteroData data = HeteroData() # Create two node types "paper" and "author" holding a feature matrix: data['paper'].x = torch.randn(num_papers, num_paper_features) data['author'].x = torch.randn(num_authors, num_authors_features) # Create an edge type "(author, writes, paper)" and building the # graph connectivity: data['author', 'writes', 'paper'].edge_index = ... # [2, num_edges] data['paper'].num_nodes >>> 23 data['author', 'writes', 'paper'].num_edges >>> 52 # PyTorch tensor functionality: data = data.pin_memory() data = data.to('cuda:0', non_blocking=True)
Note that there exists multiple ways to create a heterogeneous graph data, e.g.:
To initialize a node of type
"paper"holding a node feature matrixx_papernamedx:from torch_geometric.data import HeteroData # (1) Assign attributes after initialization, data = HeteroData() data['paper'].x = x_paper # or (2) pass them as keyword arguments during initialization, data = HeteroData(paper={ 'x': x_paper }) # or (3) pass them as dictionaries during initialization, data = HeteroData({'paper': { 'x': x_paper }})
To initialize an edge from source node type
"author"to destination node type"paper"with relation type"writes"holding a graph connectivity matrixedge_index_author_papernamededge_index:# (1) Assign attributes after initialization, data = HeteroData() data['author', 'writes', 'paper'].edge_index = edge_index_author_paper # or (2) pass them as keyword arguments during initialization, data = HeteroData(author__writes__paper={ 'edge_index': edge_index_author_paper }) # or (3) pass them as dictionaries during initialization, data = HeteroData({ ('author', 'writes', 'paper'): { 'edge_index': edge_index_author_paper } })
- classmethod from_dict(mapping: Dict[str, Any]) HeteroData[source]
Creates a
HeteroDataobject from a Python dictionary.
- node_items() List[Tuple[str, NodeStorage]][source]
Returns a list of node type and node storage pairs.
- edge_items() List[Tuple[Tuple[str, str, str], EdgeStorage]][source]
Returns a list of edge type and edge storage pairs.
- to_namedtuple() NamedTuple[source]
Returns a
NamedTupleof stored key/value pairs.
- set_value_dict(key: str, value_dict: Dict[str, Any]) HeteroData[source]
Sets the values in the dictionary
value_dictto the attribute with namekeyto all node/edge types present in the dictionary.data = HeteroData() data.set_value_dict('x', { 'paper': torch.randn(4, 16), 'author': torch.randn(8, 32), }) print(data['paper'].x)
- update(data: HeteroData) HeteroData[source]
Updates the data object with the elements from another data object.
- __cat_dim__(key: str, value: Any, store: Optional[Union[NodeStorage, EdgeStorage]] = None, *args, **kwargs) Any[source]
Returns the dimension for which the value
valueof the attributekeywill get concatenated when creating mini-batches usingtorch_geometric.loader.DataLoader.Note
This method is for internal use only, and should only be overridden in case the mini-batch creation process is corrupted for a specific attribute.
- __inc__(key: str, value: Any, store: Optional[Union[NodeStorage, EdgeStorage]] = None, *args, **kwargs) Any[source]
Returns the incremental count to cumulatively increase the value
valueof the attributekeywhen creating mini-batches usingtorch_geometric.loader.DataLoader.Note
This method is for internal use only, and should only be overridden in case the mini-batch creation process is corrupted for a specific attribute.
- property num_node_features: Dict[str, int]
Returns the number of features per node type in the graph.
- property num_features: Dict[str, int]
Returns the number of features per node type in the graph. Alias for
num_node_features.
- property num_edge_features: Dict[Tuple[str, str, str], int]
Returns the number of features per edge type in the graph.
- metadata() Tuple[List[str], List[Tuple[str, str, str]]][source]
Returns the heterogeneous meta-data, i.e. its node and edge types.
data = HeteroData() data['paper'].x = ... data['author'].x = ... data['author', 'writes', 'paper'].edge_index = ... print(data.metadata()) >>> (['paper', 'author'], [('author', 'writes', 'paper')])
- collect(key: str) Dict[Union[str, Tuple[str, str, str]], Any][source]
Collects the attribute
keyfrom all node and edge types.data = HeteroData() data['paper'].x = ... data['author'].x = ... print(data.collect('x')) >>> { 'paper': ..., 'author': ...}
Note
This is equivalent to writing
data.x_dict.
- get_node_store(key: str) NodeStorage[source]
Gets the
NodeStorageobject of a particular node typekey. If the storage is not present yet, will create a newtorch_geometric.data.storage.NodeStorageobject for the given node type.data = HeteroData() node_storage = data.get_node_store('paper')
- get_edge_store(src: str, rel: str, dst: str) EdgeStorage[source]
Gets the
EdgeStorageobject of a particular edge type given by the tuple(src, rel, dst). If the storage is not present yet, will create a newtorch_geometric.data.storage.EdgeStorageobject for the given edge type.data = HeteroData() edge_storage = data.get_edge_store('author', 'writes', 'paper')
- rename(name: str, new_name: str) HeteroData[source]
Renames the node type
nametonew_namein-place.
- subgraph(subset_dict: Dict[str, Tensor]) HeteroData[source]
Returns the induced subgraph containing the node types and corresponding nodes in
subset_dict.If a node type is not a key in
subset_dictthen all nodes of that type remain in the graph.data = HeteroData() data['paper'].x = ... data['author'].x = ... data['conference'].x = ... data['paper', 'cites', 'paper'].edge_index = ... data['author', 'paper'].edge_index = ... data['paper', 'conference'].edge_index = ... print(data) >>> HeteroData( paper={ x=[10, 16] }, author={ x=[5, 32] }, conference={ x=[5, 8] }, (paper, cites, paper)={ edge_index=[2, 50] }, (author, to, paper)={ edge_index=[2, 30] }, (paper, to, conference)={ edge_index=[2, 25] } ) subset_dict = { 'paper': torch.tensor([3, 4, 5, 6]), 'author': torch.tensor([0, 2]), } print(data.subgraph(subset_dict)) >>> HeteroData( paper={ x=[4, 16] }, author={ x=[2, 32] }, conference={ x=[5, 8] }, (paper, cites, paper)={ edge_index=[2, 24] }, (author, to, paper)={ edge_index=[2, 5] }, (paper, to, conference)={ edge_index=[2, 10] } )
- Parameters
subset_dict (Dict[str, LongTensor or BoolTensor]) – A dictionary holding the nodes to keep for each node type.
- edge_subgraph(subset_dict: Dict[Tuple[str, str, str], Tensor]) HeteroData[source]
Returns the induced subgraph given by the edge indices in
subset_dictfor certain edge types. Will currently preserve all the nodes in the graph, even if they are isolated after subgraph computation.
- node_type_subgraph(node_types: List[str]) HeteroData[source]
Returns the subgraph induced by the given
node_types, i.e. the returnedHeteroDataobject only contains the node types which are included innode_types, and only contains the edge types where both end points are included innode_types.
- edge_type_subgraph(edge_types: List[Tuple[str, str, str]]) HeteroData[source]
Returns the subgraph induced by the given
edge_types, i.e. the returnedHeteroDataobject only contains the edge types which are included inedge_types, and only contains the node types of the end points which are included innode_types.
- to_homogeneous(node_attrs: Optional[List[str]] = None, edge_attrs: Optional[List[str]] = None, add_node_type: bool = True, add_edge_type: bool = True, dummy_values: bool = True) Data[source]
Converts a
HeteroDataobject to a homogeneousDataobject. By default, all features with same feature dimensionality across different types will be merged into a single representation, unless otherwise specified via thenode_attrsandedge_attrsarguments. Furthermore, attributes namednode_typeandedge_typewill be added to the returnedDataobject, denoting node-level and edge-level vectors holding the node and edge type as integers, respectively.- Parameters
node_attrs (List[str], optional) – The node features to combine across all node types. These node features need to be of the same feature dimensionality. If set to
None, will automatically determine which node features to combine. (default:None)edge_attrs (List[str], optional) – The edge features to combine across all edge types. These edge features need to be of the same feature dimensionality. If set to
None, will automatically determine which edge features to combine. (default:None)add_node_type (bool, optional) – If set to
False, will not add the node-level vectornode_typeto the returnedDataobject. (default:True)add_edge_type (bool, optional) – If set to
False, will not add the edge-level vectoredge_typeto the returnedDataobject. (default:True)dummy_values (bool, optional) – If set to
True, will fill attributes of remaining types with dummy values. Dummy values areNaNfor floating point attributes, and-1for integers. (default:True)
- get_all_tensor_attrs() List[TensorAttr][source]
Obtains all tensor attributes stored in this
FeatureStore.
- get_all_edge_attrs() List[EdgeAttr][source]
Obtains all edge attributes stored in the
GraphStore.
- apply(func: Callable, *args: List[str])
Applies the function
func, either to all attributes or only the ones given in*args.
- apply_(func: Callable, *args: List[str])
Applies the in-place function
func, either to all attributes or only the ones given in*args.
- clone(*args: List[str])
Performs cloning of tensors, either for all attributes or only the ones given in
*args.
- coalesce()
Sorts and removes duplicated entries from edge indices
edge_index.
- contiguous(*args: List[str])
Ensures a contiguous memory layout, either for all attributes or only the ones given in
*args.
- coo(edge_types: Optional[List[Any]] = None, store: bool = False) Tuple[Dict[Tuple[str, str, str], Tensor], Dict[Tuple[str, str, str], Tensor], Dict[Tuple[str, str, str], Optional[Tensor]]]
Obtains the edge indices in the
GraphStorein COO format.- Parameters
edge_types (List[Any], optional) – The edge types of edge indices to obtain. If set to
None, will return the edge indices of all existing edge types. (default:None)store (bool, optional) – Whether to store converted edge indices in the
GraphStore. (default:False)
- cpu(*args: List[str])
Copies attributes to CPU memory, either for all attributes or only the ones given in
*args.
- csc(edge_types: Optional[List[Any]] = None, store: bool = False) Tuple[Dict[Tuple[str, str, str], Tensor], Dict[Tuple[str, str, str], Tensor], Dict[Tuple[str, str, str], Optional[Tensor]]]
Obtains the edge indices in the
GraphStorein CSC format.- Parameters
edge_types (List[Any], optional) – The edge types of edge indices to obtain. If set to
None, will return the edge indices of all existing edge types. (default:None)store (bool, optional) – Whether to store converted edge indices in the
GraphStore. (default:False)
- csr(edge_types: Optional[List[Any]] = None, store: bool = False) Tuple[Dict[Tuple[str, str, str], Tensor], Dict[Tuple[str, str, str], Tensor], Dict[Tuple[str, str, str], Optional[Tensor]]]
Obtains the edge indices in the
GraphStorein CSR format.- Parameters
edge_types (List[Any], optional) – The edge types of edge indices to obtain. If set to
None, will return the edge indices of all existing edge types. (default:None)store (bool, optional) – Whether to store converted edge indices in the
GraphStore. (default:False)
- cuda(device: Optional[Union[int, str]] = None, *args: List[str], non_blocking: bool = False)
Copies attributes to CUDA memory, either for all attributes or only the ones given in
*args.
- detach(*args: List[str])
Detaches attributes from the computation graph by creating a new tensor, either for all attributes or only the ones given in
*args.
- detach_(*args: List[str])
Detaches attributes from the computation graph, either for all attributes or only the ones given in
*args.
- generate_ids()
Generates and sets
n_idande_idattributes to assign each node and edge to a continuously ascending and unique ID.
- get_edge_index(*args, **kwargs) Tuple[Tensor, Tensor]
Synchronously obtains an
edge_indextuple from theGraphStore.
- get_tensor(*args, convert_type: bool = False, **kwargs) Union[Tensor, ndarray]
Synchronously obtains a
tensorfrom theFeatureStore.- Parameters
**kwargs (TensorAttr) – Any relevant tensor attributes that correspond to the feature tensor. See the
TensorAttrdocumentation for required and optional attributes.convert_type (bool, optional) – Whether to convert the type of the output tensor to the type of the attribute index. (default:
False)
- Raises
ValueError – If the input
TensorAttris not fully specified.KeyError – If the tensor corresponding to the input
TensorAttrwas not found.
- get_tensor_size(*args, **kwargs) Optional[Tuple[int, ...]]
Obtains the size of a tensor given its
TensorAttr, orNoneif the tensor does not exist.
- is_coalesced() bool
Returns
Trueif edge indicesedge_indexare sorted and do not contain duplicate entries.
- property is_cuda: bool
Returns
Trueif anytorch.Tensorattribute is stored on the GPU,Falseotherwise.
- multi_get_tensor(attrs: List[TensorAttr], convert_type: bool = False) List[Union[Tensor, ndarray]]
Synchronously obtains a list of tensors from the
FeatureStorefor each tensor associated with the attributes inattrs.Note
The default implementation simply iterates over all calls to
get_tensor(). Implementor classes that can provide additional, more performant functionality are recommended to to override this method.- Parameters
attrs (List[TensorAttr]) – A list of input
TensorAttrobjects that identify the tensors to obtain.convert_type (bool, optional) – Whether to convert the type of the output tensor to the type of the attribute index. (default:
False)
- Raises
ValueError – If any input
TensorAttris not fully specified.KeyError – If any of the tensors corresponding to the input
TensorAttrwas not found.
- property num_edges: int
Returns the number of edges in the graph. For undirected graphs, this will return the number of bi-directional edges, which is double the amount of unique edges.
- pin_memory(*args: List[str])
Copies attributes to pinned memory, either for all attributes or only the ones given in
*args.
- put_edge_index(edge_index: Tuple[Tensor, Tensor], *args, **kwargs) bool
Synchronously adds an
edge_indextuple to theGraphStore. Returns whether insertion was successful.- Parameters
edge_index (Tuple[torch.Tensor, torch.Tensor]) – The
edge_indextuple in a format specified inEdgeAttr.**kwargs (EdgeAttr) – Any relevant edge attributes that correspond to the
edge_indextuple. See theEdgeAttrdocumentation for required and optional attributes.
- put_tensor(tensor: Union[Tensor, ndarray], *args, **kwargs) bool
Synchronously adds a
tensorto theFeatureStore. Returns whether insertion was successful.- Parameters
tensor (torch.Tensor or np.ndarray) – The feature tensor to be added.
**kwargs (TensorAttr) – Any relevant tensor attributes that correspond to the feature tensor. See the
TensorAttrdocumentation for required and optional attributes.
- Raises
ValueError – If the input
TensorAttris not fully specified.
- record_stream(stream: Stream, *args: List[str])
Ensures that the tensor memory is not reused for another tensor until all current work queued on
streamhas been completed, either for all attributes or only the ones given in*args.
- remove_edge_index(*args, **kwargs) bool
Synchronously deletes an
edge_indextuple from theGraphStore. Returns whether deletion was successful.
- remove_tensor(*args, **kwargs) bool
Removes a tensor from the
FeatureStore. Returns whether deletion was successful.- Parameters
**kwargs (TensorAttr) – Any relevant tensor attributes that correspond to the feature tensor. See the
TensorAttrdocumentation for required and optional attributes.- Raises
ValueError – If the input
TensorAttris not fully specified.
- requires_grad_(*args: List[str], requires_grad: bool = True)
Tracks gradient computation, either for all attributes or only the ones given in
*args.
Moves attributes to shared memory, either for all attributes or only the ones given in
*args.
- size(dim: Optional[int] = None) Optional[Union[Tuple[Optional[int], Optional[int]], int]]
Returns the size of the adjacency matrix induced by the graph.
- to(device: Union[int, str], *args: List[str], non_blocking: bool = False)
Performs tensor device conversion, either for all attributes or only the ones given in
*args.
- update_tensor(tensor: Union[Tensor, ndarray], *args, **kwargs) bool
Updates a
tensorin theFeatureStorewith a new value. Returns whether the update was succesful.Note
Implementor classes can choose to define more efficient update methods; the default performs a removal and insertion.
- Parameters
tensor (torch.Tensor or np.ndarray) – The feature tensor to be updated.
**kwargs (TensorAttr) – Any relevant tensor attributes that correspond to the feature tensor. See the
TensorAttrdocumentation for required and optional attributes.
- view(*args, **kwargs) AttrView
Returns a view of the
FeatureStoregiven a not yet fully-specifiedTensorAttr.