Metagraph Feature Format

A metagraph feature file contains metagraph-based feature vectors for nodes and node pairs on a heterogeneous graph. The feature file is part of the input to the semantic proximity search program, which uses metagraph-based features to train a learning-to-rank model for proximity search. In particular, we use anchored metagraphs as features, first proposed in our TKDD19 paper. The anchored metagraph concept is an extension of metagraph in our ICDE16 paper.

The file is a tab delimited text file, as illustrated in the sample below.

5
0 162 0 0
1 140 3 4
2 3 4
3 152 3 4
4 126 2 2
5 251 2 4
10 -1 1:24.0 2:1.0 4:9.0
52 115 0:47.0 1:3.0 3:2.0 4:11.0
-1 92 2:39.0 4:12.0
...

First line is a single integer n representing the number of dimensions of the feature vectors, i.e., the number of anchored metagraphs.

The next n lines describe the anchored metagraphs, each line for one anchored metagraph. Each anchored metagraph is a feature, summarized by four integers separated by tab characters in the following form:

[FeatureID]<tab>[MetagraphID]<tab>[HeadColor]<tab>[TailColor],

where

  • [FeatureID] is the ID of a feature, i.e., an anchored metagraph. Note that an anchored metagraph is defined by the triple (metagraph, head, node), as described by the next three integers;

  • [MetagraphID] is the ID of the corresponding metagraph in the anchored metagraph;

  • [HeadColor] is the color of the head anchor in the metagraph;

  • [TailColor] is the color of the tail anchor in the metagraph.

The remaining lines contain the feature vectors. Each line represents a feature vector of one node or node pair in the following form:

[HeadID]<tab>[TailID]<tab>[FeatureID]:[FeatureValue]<tab>[FeatureID]:[FeatureValue]...,

where

  • the first two fields [HeadID] / [TailID] is the ID of the head / tail node in a heterogeneous graph, i.e., this line is a feature vector for the node pair with head / tail roles in each metagraph. Either [HeadID] or [TaiIID] can be -1, in which case this is the feature vector of a single node with a head or tail role;

  • subsequent fields contain the sparse representation of a feature vector in the form of one or more [FeatureID]:[FeatureValue] entries. In particular, [FeatureID] is the ID of a feature (i.e., an anchored metagraph), and [FeatureValue] is a float type indicating the value of the feature (e.g., number of instances of the anchored metagraph containing this node pair with matching roles).