Data and Tools‎ > ‎

Metagraph Database Format

A Metagraph Database file describes a list of processed metagraphs, as output by the SubMatch program. The definition and details of metagraph can be found in our paper:

Y. Fang, W. Lin, V. W. Zheng, M. Wu, K. C.-C. Chang and X. Li. Semantic Proximity Search on Graphs with Metagraph-based Learning. In ICDE 2016, pp. 277--288. [PDF] [BibTex]

It is a tab delimited text file, as illustrated in the sample below (tab characters shown as spaces here).

# 0 4 4
T 1 0 1 0
E 0 1 0 3 1 0 1 2 2 1 2 3 3 0 3 2
S 1
C 0 1 0 1
F 588395
# 1 5 6
T 1 0 1 0 0
E 0 1 0 3 0 4 1 0 1 2 2 1 2 3 2 4 3 0 3 2 4 0 4 2
S 1
C 0 1 0 1 1

Each metagraph consists of 6 lines.

Line 1: # <Metagraph-ID> <# Nodes> <# Edges>

Line 2: T <NodeType> <NodeType> ..., indicating the node types of each node

Line 3: E <NodeIndex> <NodeIndex> ..., indicating the connectivity among nodes, where each pair of node indices (1st and 2nd indices form one pair, 3rd and 4th form another pair, etc.) indicates an outgoing edge from the first index to the second. The index corresponding to the order of nodes listed in Line 2, starting with 0. Each undirected edge will have two pair of indices, one pair in each direction.

Line 4: S <1 or 0>, indicating if the metagraph is symmetric (1) or not (0).

Line 5: C <NodeColor> <NodeColor> ..., indicating the "colors" of the nodes such that symmetric nodes have the same color and non-symmetric nodes have different colors. The order of nodes follows that in Line 2.

Line 6: F <Frequency>, indicating the number of instances matched. Note that a frequency of -1 indicates that we do not know the total number of instances on the entire input graph (for example, when a maxfreq parameter is specified in the SubMatch program, the program will finish after finding maxfreq instances and thus the total number of instances remain unknown).