Frequent Subgraph Mining (FSM) is an important research task on graph data. Near about 90% searching in the world is frequent. It is challenging task to the search result with big data in efficient time, which unable to feet into memory and also have I/O performance limitations. The problem statement is to find the complete and feasible solution to FSM problem by building a distributed system which improves run up in memory, I/O performance, execution time and quality of result with the low-cost system using Hadoop. We are avoiding memory, I/O performance limitations of existing methods Frequent Subgraph Mining Algorithm in Hadoop (FSM-H) and A Sampling-based method for top-k Frequent Subgraph Mining (FS3) by combing these two methods and new technologies to implement FS3-H. We are doing an experiment on the graph by partitioning it into k-subgraph with multilevel graph partitioning (MGP) approach and k-means clustering algorithm on the decentralized cluster on low-cost systems connected with Java Remote Method Invocation (RMI) technology using Apache Spark framework which is built on Hadoop and alternative to Hadoop iterative Map-Reduce of FSM-H. Also, we are doing an experimental analysis on mining of graph by storing and fetching top-k frequent sub graph of FS3 parallel with user defined support in Hadoop and MySQL into decentralized distributed k-sub-queue manager. The proposed method will avoid limitations of existing methods. It has applications in various disciplines, including chemo informatics, bioinformatics, social sciences and various aspects of real life.
Keywords : Graph Storage, Graph Mining, FS3-H, Hadoop.