OverviewThis tool aims to learn semantic proximity between nodes on the graph based on metagraphs.CitationY. Fang, W. Lin, V. W. Zheng, M. Wu, K. C.-C. Chang and X. Li. Semantic Proximity Search on Graphs with Metagraph-based Learning. In ICDE 2016, pp. 277--288. [PDF] [BibTex] Code Download
UsageStep 1: Match metagraph instancesjava -cp lib\* -DConfig=<String> prep.MineGraph -cp lib\* Java classpath, including path to the main program and Trove library JAR files. -DConfig=<String> The configuration file, a plain text to store configuration properties -- see details below. Step 2: Build feature index from matched metagraph instances java -cp lib\* -DConfig=<String> prep.BuildFeature Step 3: Learn semantic proximity on training set, and evaluate on test set java -cp lib\* -DConfig=<String> -DClass=<String> -DSize=<Integer> exec.Learn One of the semantic classes to be learnt, as given in the groundtruth file -- see details below. -DSize=<Integer> Number of training examples to be used (must not exceed the total number of examples in the training splits -- see details below). Configuration propertiesA sample configuration file is included in the download, which consists of key-value pairs. FILE_GRAPH=<String> Path to the labeled graph file. FILE_GROUNDTRUTH_DB=<String> Path to the groundtruth file. Each line represents one query and its candidate nodes, in the following format delimited by tabs: <Class> <Query> <Candidate>:<L> <Candidate>:<L> <Candidate>:<L> ... where <Class> is the desired semantic class, <Query> is the query node ID, <Candidate> is the candidate node ID, and <L> is either 1 or 0 to indicate whether the preceding candidate node is a true answer for the query. FILE_METAGRAPH_QUERY=<String> Path to the Metagraph Query file, which can be generated by the modified GRAMI. A pre-generated file is included for each dataset in the download. FILE_METAGRAPH_DB=<String> Path to the Metagraph Database file. FILE_FEATURE_DB=<String> Path to the feature index file. Will be generated by step 2 above automatically. LIB_SUBMATCH=<String> Path to the SubMatch program. Note that we are using an earlier version (included in the download) different from what we released here; using the latest released version will not work correctly with the main program. DIR_OUT=<String> Directory for result output. DIR_SPLITS=<String> Directory for training and testing data splits. The splits are stored in files with the name train_<Class>_<n> or test_<Class>_<n> for training and testing data respectively, where <Class> is the semantic class and <n> is the split number. Train and test queries in the same split do not have overlaps. In training data, each line is a triplet of q, x, y such that for query q, node x should be ranked before node y. In testing data, each line is a list of tab delimited node IDs, where the first ID is the query, and subsequent IDs are candidate nodes. DIR_METAGRAPH_INSTANCE=<String> Directory for metagraph instances generated by step 1 above. NUM_SPLITS=<Integer> Total number of splits to use. MAX_THREADS=<Integer> Maximum number of threads available. CORE_TYPE=<Integer> Type of the core nodes, i.e., the type of nodes used as queries. In our datasets, it is 0 or the user nodes. MAX_VERTICE=<Integer> We only consider metagraphs up to this limit. MAX_FREQ=<Integer> We only consider metagraphs with number of instances up to this limit. MU=<Integer> A scaling parameter controlling the shape of the Sigmoid distribution in the likelihood function. GD_STEP=10 Learning rate for gradient descent. GD_EPSILON=1E-5 Maximum relative different before gradient descent stops. GD_TRY=5 Number of trials to perform gradient descent using different seeds. Sample DataSample data of two input graphs and their correponding metagraph queries are included, which are also used in our citation above. They are derived from SNAP's Facebook data and Forward's LinkedIn data.DisclaimerWe provide any code and/or data on an as-is basis. Use at your own risk. |
Data and Tools >