Profiting from several recommendation algorithms using a scalable approach

2:00 pm - 2:25 pm 24 Thursday

Track

Pattern recognition

1 Introduction

Internet has become the biggest world market: thousand of companies reach every connected home offering millions of products to global customers, and recommender systems (RSs) are required to present products that may be of interest to the users. Although different approaches have been applied to develop RSs in the last decade [1][2], we consider in this paper the possibility of using several algorithms instead of a single one, trying to produce better recommendations, which requires high performance computing platforms to run all of the algorithms together. The system response must be then selected among those provided by each of the RSs responses. We apply a Fuzzy Rule-Based System (FRBS) to take the decision required.The parameters used by the FRBS are calculated from historical data from users and items.The main problem is the time required to compute each of the RSs every time a new recommendation must be generated. We decided to employ a Big Data approach relying on the well known Hadoop framework[3] which allows to parallelize the task, thus shortening execution time.

2 Methodology

Our methodology relies in two processes, an offline and online process. The offline process, which is split and run as a number of Hadoop jobs, generates user profiles which are used in the online process to decide what recommendation algorithm should generate the final recommendation.Several parameters are calculated by this offline process to obtain the necessary knowledge from users and items. The number of items that each user has rated are the basis for computing user profiles and parameters required, which are later employed to feed the FRBS in charge of selecting the final output of the system among those provided by each of the RSs.for the FRBS, it enables us to obtain the best recommendation depending on the user profile.

3 Results

In order to test the scalability of the proposal, we have run the recommender system using different number of computing nodes in a hadoop based cluster. As we notice in figure 1, the system shows scalability.Along the tests, the machine resources have been monitored. We noticed that CPU usage has been equally distributed along the machines, and the memory usage was not high. Reading and writing phases of Hadoop jobs were done in local disks, instead of using remote disks, leading to lower network traffic.In contrast, HBase scalability was not as expected. We detected that when several clients are writing into HBase, adding new nodes does not notably improve the performance. However, Hadoop jobs that reads from HBase data have a high performance, obtaining a very good scalability.

4 Conclusions

The solution presented in this paper has shown the benefits of using Hadoop for distributing and parallelizing a recommender system where different recommendation algorithms are joined by an FRBS.

5 Acknowledgments

Spanish Ministry of Economy, Project UEX:EPHEMEC (TIN2014-56494-C4-2-P) and CDTI project SmartCity Platform; Gobex, FEDER GRU10029.

References

[1] Aksel, F., Birtürk, A. Enhancing Accuracy of Hybrid Recommender Systems through Adapting theDomain Trends. In Workshop on the Practical Use of Recommender Systems, Algorithms and Tech-nologies (PRSAT 2010) (p. 11).

[2] M. Garcia-Valdez and A. Alanis. Fuzzy inference for Learning Object Recommendation. Fuzzy Systems(FUZZ), 2010.

[3] Hadoop, distributed scalable fault-tolerance framework for data processing. http://hadoop.apache.org/

[4] HBase, distributed and scalable database. http://hbase.apache.org/

[5] Mahout, machine learning library for Big Data solutions. http://mahout.apache.org/