Compilation: ----------- - A simple 'make' should do in Unix-like environments. It creates the executables 'mlrmcl' and 'ncut' in the top-level directory. - 'make realclean' removes all libraries, executables and object files. General Usage and Options: ------------------------- - Executing a program without arguments will print its usage. - The options may be specified in any order (similar to how it works for general unix utilities). Usage and Options for mlrmcl: ---------------------------- - The only required argument for mlrmcl is the graph file. The format for this file is described below in the Format section. - The output file can be specified using the -o option. This is optional. The file that the output is written to is printed in the stdout output of the program. - The granularity of the output clustering can be controlled using the '-c' option. This option specifies how small the graph can get before the coarsening in MLR-MCL stops. For example, if mlrmcl is run with '-c 1000', the graph is coarsened until it has no more than 1000 vertices. The default value for this option is 1000. If c is the same as the number of vertices in the graph, then no coarsening will take place at all and this is the same as R-MCL. The smaller the value of this option, the fewer clusters are output by the program. Note that if the cluster structure is especially clear (such as for synthetic graphs), the program will simply output the same clustering regardless of the parameter value. - The balance (i.e. the variance in output cluster sizes) can be controlled using the '-b' option. The default value of 0.5 should be good enough in most cases. If you find that the output clustering is too balanced, you can try lower values for b (until 0), or if it is too imbalanced, you can try higher values for b such as 0.75 or 1. - The inflation parameter is specified using the '-i option' and can also be used to control the granularity of the clustering. (In the case when no coarsening is performed, i.e. for R-MCL, it is the only way to control the number of clusters.) Higher values of 'i' lead to more clusters, and the clustering also converges faster. The default is 2.0. Usage and Options for mergeClusters: ----------------------------------- - The mergeClusters program performs hierarchical agglomerative clustering. It may be used in situations where the user needs to exactly control the number of output clusters. First, one can run MLR-MCL, setting the options such that more clusters than required are output by the program. Subsequently one may run the mergeClusters program, specifying the number of merges to be the same as the number of additional clusters that were output by MLR-MCL. - The exact usage for mergeClusters can be seen by executing the program without any arguments. Input format: ------------ - The input format is the same as that for Metis and Graclus. A pdf document explaining this format is available at http://glaros.dtc.umn.edu/gkhome/fetch/sw/metis/manual.pdf This pdf is also present inside the Metis distribution. For convenience, we have also included a copy of this manual under the name metis.4.0.manual.pdf. Output format: ------------- - The output format is also the same as that for Metis and Graclus. Each line contains the cluster index to which the node of the corresponding line number has been assigned. (For example, if line 20 is '4', that means that the node 20 has been assigned to the cluster 4.) Examples: --------- - The 'examples' folder has two graphs: synthetic.graph, which is a synthetic graph of 1000 nodes, generated with 25 clusters and astro.graph, which is a collaboration network of astro physics researchers (see http://snap.stanford.edu). The following are some example usages (assuming the software is compiled, and we are in the example directory): - ../mlrmcl -o synthetic.graph.out synthetic.graph - ../mlrmcl -o astro.graph.out astro.graph - ../mlrmcl -c 500 -o astro.graph.c500.out astro.graph - ../mlrmcl -c 2000 -o astro.graph.c2000.out astro.graph - ../mlrmcl -c 2000 -b 0.25 -o astro.graph.c2000.b0.25.out astro.graph - ../mlrmcl -c 20000 -i 1.8 -b 0.25 -o astro.graph.c20000.i1.8.b0.25.out astro.graph (The last example above does not perform any coarsening, since the c value 20000 is more than the number of vertices in the graph, which is 17903.) - ../mergeClusters -e astro.graph.c500.out -n 10 -o astro.graph.c500.10merges.out astro.graph (If astro.graph.c500.out represents the clustering of astro.graph into x clusters, then astro.graph.c500.10.merges.out is a clustering of x-10 clusters.) References: ---------- - Venu Satuluri and Srinivasan Parthasarathy. "Scalable Graph Clustering using Stochastic Flows: Applications to Community Discovery." Proceedings of ACM SIGKDD 2009, Paris. - Venu Satuluri, Srinivasan Parthasarathy and Dugyu Ucar. "Markov Clustering of Protein Interaction Networks with Improved Balance and Scalablity". Proceedings of ACM BCB 2010, Niagara Falls. Acknowledgments: --------------- I am very grateful to the authors of Metis and Graclus for releasing the source of their softwares, as this has enabled me to implement my own software much faster than would have been possible otherwise.