Analyzing the Query Performance of Hive-QL with ORCfile on Hadoop Cluster
DOI:
https://doi.org/10.14456/rmutlengj.2017.12Keywords:
Big data, Hadoop, Hive, Performance AnalysisAbstract
The weather forecast data is one of the most important datasets in Big Data. The Hive application was the first relational database that runs on Hadoop cluster. This paper presents a performance analysis of HiveQL on Hadoop cluster with varying number of data node and data replication. The results show that the best performing Map-Reduce configuration for distributed nodes in Hadoop cluster is Map=5/Reduce=1. This ratio is consistent with the best query performance setup which is 3 replications per 5 data nodes. Meanwhile, increasing the number of data nodes and replications did not affect the result in anyway.
References
2. Hadoop’s open source query tools. Performance test of Pig vs Hive with code examples. Available From: http://www.open-bigdata.com/performance-test-pig-vs-hive-code-examples/ [Accessed 5th Fab 2017].
3. D. Abadi, S. Babu, F. Ozcan, and I Pandis. Tutorial: SQL-on-Hadoop Systems. Proceedings of the VLDB Endowment. 2015 Aug 31-Sep 4; Kohala Coast, Hawaii. p. 2050-2051.
4 K. Jayasri, R. Rajmohan, and D. Dinagaran. Analyzing the Query Performances of Description Logic based Service Matching using Hadoop. Proceeding of International Conference on Smart Technologies and Management for Computing, Communication, Controls, Energy and Materials (ICSTM), Vel Tech Rangarajan Dr. Sagunthala R&D Institute of Science and Technology. 2015 May 6-8; Chennai, T.N., India. p. 1-7.
5. Adisorn G, Suparerk M. Performance of the Apache Mahout on Apache Hadoop Cluster. Proceeding of Electrical Engineering Conference 38th. 2015 Nov 18-20; Pranakornsrya, Ayutthaya, Thailand, p. 858-861.Thai.
6. The Big Data Blog. Hadoop Ecosystem Overview. Available from: http://thebigdata blog.weebly.com/blog/the-hadoop-ecosystem-overview/ [Accessed 5th Fab 2017].
7. The Hortonworks Blog. ORCFile in HDP 2: Better Compression, Better Performance. Available from: http://hortonworks.com/blog /orcfile-in-hdp-2-better-compression-better-performance/.
8. MAPR.blog. What Kind of Hive Table is Best for Your Data. Available From: https://www.mapr .com/blog/what-kind-hive-table-best-your-data/ [Accessed 5th Fab 2017].