The Design of Triple Store and Query Processing on GPU for Large Scale Resource Description Framework Data

Authors

  • Pisit Makpaisit Department of Computer Engineering, Faculty of Engineering, Kasetsart University
  • Chantana Chantrapornchai Department of Computer Engineering, Faculty of Engineering, Kasetsart University

Keywords:

RDF, Query Processing, SPARQL, GPU

Abstract

The Resource Description Framework (RDF) is commonly used as a standard for data interchange on the web.  Due to the current big data era, its size is prone to increase drastically. In order to speed up the large RDF data query, we propose a novel RDF data representation along with the RDF query algorithm utilizing GPU processing. We also present the representation that is suitable for RDF data and GPU processing, the indexing approach, the querying process in the GPU and other techniques that increase the efficiency such as pre-upload filtering, ID assignment by using the term similarity of term vector, and dimensional reduction to transform back to term ID. The experiments show that the developed framework utilizes the storage only 1/6 of the original one and can reduce the querying time. The speedup obtained can be up to 29.57 when compared with the RDF-3X system and 45.23 when compared to using gStore, a graph data store.

References

M. Atre, J. Srinivasan and J. A. Hendler, “BitMat: A main memory RDF triple store,”. Tetherless World Constellation, Rensselar Plytehcnic Institute, Troy NY, USA, Technical Rep., 2009.

M. Galkin, K. M. Endris, M. Acosta, D. Collarana, M. E. Vidal and S. Auer, “SMJoin: A Multi-way Join Operator for SPARQL Queries,” in Proc. 13th International Conference on Semantic Systems, Amsterdam, Netherlands, Sep. 11–14, 2017, pp. 104–111.

T. Neumann and G. Weikum, “The RDF-3X engine for scalable management of RDF data,” The VLDB Journal, vol. 19, pp. 91–113, 2010, doi: 10.1007/s00778-009-0165-y.

P. Peng, L. Zou, M. T. Özsu, L. Chen and D. Zhao, “Processing SPARQL queries over distributed RDF graphs,” The VLDB Journal, vol. 25, pp. 243-268, 2016, doi: 10.1007/s00778-015-0415-0.

S. Gurajada, S. Seufert, I. Miliaraki and M. Theobald, “TriAD: a distributed shared-nothing RDF engine based on asynchronous message passing,” in Proc. 2014 ACM SIGMOD international conference on Management of data, Snowbird, UT, USA, Jun. 22–27, 2014, pp. 289–300.

A. Bonifati, W. Martens and T. Timm, “SHARQL: Shape analysis of recursive SPARQL queries,” in Proc. 2020 ACM SIGMOD International Conference on Management of Data, Portland, OR, USA, Jun. 14–19, 2020, pp. 2701–2704.

K. Rabbani, M. Lissandrini and K. Hose, “Optimizing SPARQL queries using shape statistics,” in Proc. 24th International Conference on Extending Database Technology, Nicosia, Cyprus, Mar. 23–26, 2021, pp. 505–510.

C. Chantrapornchai and C. Choksuchat, “TripleID-Q: RDF query processing framework using GPU,” IEEE Transactions on Parallel and Distributed Systems, vol. 29, no. 9, pp. 2121–2135, doi: 10.1109/TPDS.2018.2814567.

F. T. Jamour, I. Abdelaziz and P. Kalnis, (2018) “A demonstration of MAGiQ: matrix algebra approach for solving RDF graph queries,” Proceedings of the VLDB Endowment, vol 11, no. 12, pp. 1978–1981, doi: 10.14778/3229863.3236239.

Z. Yao, R. Chen, B. Zang and H. Chen, “Fast and concurrent RDF query processing using RDMA-assisted GPU graph exploration,” IEEE Transactions on Parallel and Distributed Systems, vol. 33, no. 7, pp. 1619–1635, 2022, doi: 10.1109/TPDS.2021.3121568.

S. Jiaming, X. Zhang, P. Peng, Z. Feng, and L. Zou. "Mapsq: A plugin-based mapreduce framework for sparql queries on gpu." in Companion Proceedings of the The Web Conference, Geneva, Switzerland, Apr. 23–27, 2018, pp. 81–82.

T. Ren, G. Rao, X. Zhang, and Z. Feng, “SRSPG: A Plugin-based Spark Framework for Large-scale RDF Streams Processing on GPU,” in Proc. ISWC 2019 Satellite Tracks (Posters & Demonstrations, Industry, and Outrageous Ideas), Auckland, New Zealand, Oct. 26–30, 2019, pp. 89-92.

L. Zou, J. Mo, L. Chen, M. T. Özsu and D. Zhao, “gStore: answering SPARQL queries via subgraph matching,” Proceedings of the VLDB Endowment, vol. 4, no. 8, pp. 482–493, doi: 10.14778/2002974.2002976.

L. Zeng and L. Zou, “ Redesign of the gStore sysem,” Frontiers of Computer science, vol. 12, pp. 623–641, 2018, doi: 10.1007/s11704-018-7212-z.

V. Ingalalli, D. Ienco, P. Poncelet and S. Villata, “Querying RDF Data Using A Multigraph-based Approach,” in Proc. 19th International Conference on Extending Database Technology, Bordeaux, France, Mar. 15–18, 2016, pp. 245–256.

C. Weiss, P. Karras, and A. Bernstein, “Hexastore: sextuple indexing for semantic web data management,” Proceedings of the VLDB Endowment, vol. 1, no. 1, pp. 1008–1019, 2008, doi: 10.14778/1453856.1453965.

A. Schätzle, M. Przyjaciel-Zablocki, S. Skilevic and G. Lausen, “S2RDF: RDF querying with SPARQL on spark,” Proceedings of the VLDB Endowment, vol. 9, no. 10, pp. 804–815, doi: 10.14778/2977797.2977806.

G. Aluç, O. Hartig, M. T. Özsu and K. Daudjee, “Diversified stress testing of RDF data management systems,” in 13th International Semantic Web Conference, Riva del Garda, Italy, October 19–23, 2014, pp. 197–212.

Downloads

Published

2023-09-28

How to Cite

[1]
P. Makpaisit and C. . Chantrapornchai, “The Design of Triple Store and Query Processing on GPU for Large Scale Resource Description Framework Data”, Eng. & Technol. Horiz., vol. 40, no. 3, p. 400312, Sep. 2023.

Issue

Section

Research Articles