Enhancing Data Retrieval Efficiency in Geospatial Surveys Using Indexing Techniques for Large-Scale JavaScript Object Notation Datasets
Keywords:
Dense Indexing, Sparse Indexing, Large Scale Dataset, Not only Structured Query Language, JavaScript Object NotationAbstract
The use of JavaScript Object Notation (JSON) format as a Not only Structured Query Language (NoSQL) storage solution has grown in popularity but has presented technical challenges, particularly in indexing large-scale JSON files. In this study, we propose using JSON data sets especially in cases to survey and access large amounts of data, but the devices used for data collection have insufficient memory to process large-sized files. We conducted experiments on 32 Gigabyte data sets with 1,000,000 transactions in JSON format and implemented two types of indexing, Dense and Sparse, to enhance data access efficiency. Additionally, we identified the suitable sample size for both indexing methods. The findings indicated that the use of dense indexing decreased data retrieval time from 26,869.218 milliseconds (Non index) to 382.196 milliseconds, a reduction of 98.58% in one-to-one data retrieval and from 38,300.848 milliseconds to 1.097 when there were no keywords. In contrast, sparse indexing reduced data retrieval time from 55,197.734 milliseconds (Non index) to 854.661 milliseconds, a decrease of 98.45% in one-to-many data retrieval and from 47,203.253 milliseconds to 0.179 milliseconds when keywords were not found. Furthermore, we discovered that for both dense and sparse indexing, all sample size ranges could rapidly manage memory and access keywords.
References
Abdulkadhem, A. A., & Al-Assadi, T. A. (2019). An Important Landmarks Construction for a GIS-Map based on Indexing of Dolly Images. Indonesian Journal of Electrical Engineering and Computer Science, 15(1), 451. https://doi.org/10.11591/ijeecs.v15.i1.pp451-459.
Abdulsada, A. I., Honi, D. G., & Al-Darraji, S. (2021). Efficient multi-keyword similarity search over encrypted cloud documents. Indonesian Journal of Electrical Engineering and Computer Science, 23(1), 510. https://doi.org/10.11591/ijeecs.v23.i1.pp510-518.
Alqatawneh, A. (2022). Orthogonal frequency division multiplexing system with an indexed-pilot channel estimation. Indonesian Journal of Electrical Engineering and Computer Science, 26(2), 808. https://doi.org/10.11591/ijeecs.v26.i2.pp808-818.
Chang, J., Xiao, L., Huo, Z., Zhou, B., Ruan, L., Wang, H., & Liu, S. (2017). Optimization of Index-Based Method of Metadata Search for Large-Scale File Systems. 2017 10th International Symposium on Computational Intelligence and Design (ISCID). https://doi.org/10.1109/iscid.2017.147.
Chopade, R., & Pachghare, V. (2020). MongoDB Indexing for Performance Improvement. Advances in Intelligent Systems and Computing, 1077, 529–539. https://doi.org/10.1007/978-981-15-0936-0_56.
Fathy, Y., Barnaghi, P., & Tafazolli, R. (2018). Large-Scale Indexing, Discovery, and Ranking for the Internet of Things (IoT). ACM Computing Surveys, 51(2), 1–53. https://doi.org/10.1145/3154525.
Gayathiri, N. R., Jaspher, D. D., & Natarajan, A. M. (2019). Big Data retrieval techniques based on Hash Indexing and MapReduce approach with NoSQL Database. 2019 International Conference on Advances in Computing and Communication Engineering (ICACCE). https://doi.org/10.1109/icacce46606.2019.9079964.
Jin, P., Zhuang, X., Luo, Y., & Lu, M. (2021, December 1). Exploring Index Structures for Zoned Namespaces SSDs. https://doi.org/10.1109/BigData52589.2021.9671606.
L. Tan, K., & C. Lim, K. (2019). Fast surveillance video indexing & retrieval with WiFi MAC address tagging. Indonesian Journal of Electrical Engineering and Computer Science, 16(1), 473. https://doi.org/10.11591/ijeecs.v16.i1.pp473-481.
Ma, Y., Liu, D., Scott, G., Uhlmann, J., & Shyu, C.-R. (2017, December 1). In-Memory Distributed Indexing for Large-Scale Media Data Retrieval. https://doi.org/10.1109/ISM.2017.38.
S, M., & MB, S. P. (2020). Indexing intelligence using benchmark classifier. Indonesian Journal of Electrical Engineering and Computer Science, 18(1), 179. https://doi.org/10.11591/ijeecs.v18.i1.pp179-187.
Yuan, J., & Liu, X. (2012). A novel index structure for large scale image descriptor search. 2012 19th IEEE International Conference on Image Processing. https://doi.org/10.1109/icip.2012.6467265.
Yusof, M. K. (2017). Efficiency of JSON for Data Retrieval in Big Data. Indonesian Journal of Electrical Engineering and Computer Science, 7(1), 250. https://doi.org/10.11591/ijeecs.v7.i1.pp250-262.
Zeffora, J., & Shobarani, S. (2022). Optimizing random forest classifier with Jenesis-index on an imbalanced dataset. Indonesian Journal of Electrical Engineering and Computer Science, 26(1), 505. https://doi.org/10.11591/ijeecs.v26.i1.pp505-511.
Zi̇neddi̇neK., Ami̇neF. M., & Adeel, A. (2018). Indexing Multimedia Data with an Extension of Binary Tree -- Image Search by Content --. International Journal of Informatics and Applied Mathematics, 1(1), 47–55. Retrieved from https://dergipark.org.tr/en/pub/ijiam/issue/43831/532310.