Enhancing Data Retrieval Efficiency in Geospatial Surveys Using Indexing Techniques for Large-Scale JavaScript Object Notation Datasets

Authors

  • Jirawat Duangkaew Program in Computer Engineering, School of Information and Communication Technology, University of Phayao, Phayao, 56000 https://orcid.org/0009-0002-2132-4728
  • Bowonsak Srisungsittisunt Program in Computer Engineering, School of Information and Communication Technology, University of Phayao, Phayao, 56000 https://orcid.org/0000-0001-5204-4070
  • Apiwat Witayarat Program in Computer Engineering, School of Information and Communication Technology, University of Phayao, Phayao, 56000 https://orcid.org/0000-0003-1650-7503
  • Narasak Boonthep Program in Computer Engineering, School of Information and Communication Technology, University of Phayao, Phayao, 56000
  • Phuwitsorn Phumsaranakhom Program in Computer Engineering, School of Information and Communication Technology, University of Phayao, Phayao, 56000
  • Jirabhorn Chaiwongsai Program in Computer Engineering, School of Information and Communication Technology, University of Phayao, Phayao, 56000

Keywords:

Dense Indexing, Sparse Indexing, Large Scale Dataset, Not only Structured Query Language, JavaScript Object Notation

Abstract

The use of JavaScript Object Notation (JSON) format as a Not only Structured Query Language (NoSQL) storage solution has grown in popularity but has presented technical challenges, particularly in indexing large-scale JSON files. In this study, we propose using JSON data sets especially in cases to survey and access large amounts of data, but the devices used for data collection have insufficient memory to process large-sized files. We conducted experiments on 32 Gigabyte data sets with 1,000,000 transactions in JSON format and implemented two types of indexing, Dense and Sparse, to enhance data access efficiency. Additionally, we identified the suitable sample size for both indexing methods. The findings indicated that the use of dense indexing decreased data retrieval time from 26,869.218 milliseconds (Non index) to 382.196 milliseconds, a reduction of 98.58% in one-to-one data retrieval and from 38,300.848 milliseconds to 1.097 when there were no keywords. In contrast, sparse indexing reduced data retrieval time from 55,197.734 milliseconds (Non index) to 854.661 milliseconds, a decrease of 98.45% in one-to-many data retrieval and from 47,203.253 milliseconds to 0.179 milliseconds when keywords were not found. Furthermore, we discovered that for both dense and sparse indexing, all sample size ranges could rapidly manage memory and access keywords.

Author Biographies

Jirawat Duangkaew, Program in Computer Engineering, School of Information and Communication Technology, University of Phayao, Phayao, 56000

Jirawat Duangkaew received a Bachelor of Science degree in Computer Science from Rambhai Barni Rajabhat University, Thailand, in 2020. He is currently pursuing a master’s degree in computer engineering at the University of Phayao, Thailand. His research interests include indexing techniques, non-relational databases, large databases, and incremental databases. 

Bowonsak Srisungsittisunt, Program in Computer Engineering, School of Information and Communication Technology, University of Phayao, Phayao, 56000

Dr. Bowonsak Srisungsittisunti  is Assistant Professor at Computer Engineering, School of Information and Communication technology, University of Phayao, Thailand. He Holds a PhD degree in Computer Engineering with specialization in data processing. His research areas are data processing, data analytic, data mining and database system. 

Apiwat Witayarat, Program in Computer Engineering, School of Information and Communication Technology, University of Phayao, Phayao, 56000

ผศ.ดร. นครินทร์ ชัยแก้ว ได้รับปริญญาเอกด้านการตรวจวัดจากระยะไกลและระบบสารสนเทศภูมิศาสตร์ จากสถาบันเทคโนโลยีแห่งเอเชีย (AIT) ผู้ช่วยศาสตราจารย์ที่ภาควิชาวิทยาการสารสนเทศภูมิศาสตร์ มหาวิทยาลัยพะเยา ประเทศไทย

References

Abdulkadhem, A. A., & Al-Assadi, T. A. (2019). An Important Landmarks Construction for a GIS-Map based on Indexing of Dolly Images. Indonesian Journal of Electrical Engineering and Computer Science, 15(1), 451. https://doi.org/10.11591/ijeecs.v15.i1.pp451-459.

Abdulsada, A. I., Honi, D. G., & Al-Darraji, S. (2021). Efficient multi-keyword similarity search over encrypted cloud documents. Indonesian Journal of Electrical Engineering and Computer Science, 23(1), 510. https://doi.org/10.11591/ijeecs.v23.i1.pp510-518.

Alqatawneh, A. (2022). Orthogonal frequency division multiplexing system with an indexed-pilot channel estimation. Indonesian Journal of Electrical Engineering and Computer Science, 26(2), 808. https://doi.org/10.11591/ijeecs.v26.i2.pp808-818.

Chang, J., Xiao, L., Huo, Z., Zhou, B., Ruan, L., Wang, H., & Liu, S. (2017). Optimization of Index-Based Method of Metadata Search for Large-Scale File Systems. 2017 10th International Symposium on Computational Intelligence and Design (ISCID). https://doi.org/10.1109/iscid.2017.147.

Chopade, R., & Pachghare, V. (2020). MongoDB Indexing for Performance Improvement. Advances in Intelligent Systems and Computing, 1077, 529–539. https://doi.org/10.1007/978-981-15-0936-0_56.

Fathy, Y., Barnaghi, P., & Tafazolli, R. (2018). Large-Scale Indexing, Discovery, and Ranking for the Internet of Things (IoT). ACM Computing Surveys, 51(2), 1–53. https://doi.org/10.1145/3154525.

Gayathiri, N. R., Jaspher, D. D., & Natarajan, A. M. (2019). Big Data retrieval techniques based on Hash Indexing and MapReduce approach with NoSQL Database. 2019 International Conference on Advances in Computing and Communication Engineering (ICACCE). https://doi.org/10.1109/icacce46606.2019.9079964.

Jin, P., Zhuang, X., Luo, Y., & Lu, M. (2021, December 1). Exploring Index Structures for Zoned Namespaces SSDs. https://doi.org/10.1109/BigData52589.2021.9671606.

L. Tan, K., & C. Lim, K. (2019). Fast surveillance video indexing & retrieval with WiFi MAC address tagging. Indonesian Journal of Electrical Engineering and Computer Science, 16(1), 473. https://doi.org/10.11591/ijeecs.v16.i1.pp473-481.

Ma, Y., Liu, D., Scott, G., Uhlmann, J., & Shyu, C.-R. (2017, December 1). In-Memory Distributed Indexing for Large-Scale Media Data Retrieval. https://doi.org/10.1109/ISM.2017.38.

S, M., & MB, S. P. (2020). Indexing intelligence using benchmark classifier. Indonesian Journal of Electrical Engineering and Computer Science, 18(1), 179. https://doi.org/10.11591/ijeecs.v18.i1.pp179-187.

Yuan, J., & Liu, X. (2012). A novel index structure for large scale image descriptor search. 2012 19th IEEE International Conference on Image Processing. https://doi.org/10.1109/icip.2012.6467265.

Yusof, M. K. (2017). Efficiency of JSON for Data Retrieval in Big Data. Indonesian Journal of Electrical Engineering and Computer Science, 7(1), 250. https://doi.org/10.11591/ijeecs.v7.i1.pp250-262.

Zeffora, J., & Shobarani, S. (2022). Optimizing random forest classifier with Jenesis-index on an imbalanced dataset. Indonesian Journal of Electrical Engineering and Computer Science, 26(1), 505. https://doi.org/10.11591/ijeecs.v26.i1.pp505-511.

Zi̇neddi̇neK., Ami̇neF. M., & Adeel, A. (2018). Indexing Multimedia Data with an Extension of Binary Tree -- Image Search by Content --. International Journal of Informatics and Applied Mathematics, 1(1), 47–55. Retrieved from https://dergipark.org.tr/en/pub/ijiam/issue/43831/532310.

Downloads

Published

2023-09-22

Issue

Section

Research Articles