A Comparison of Real-Time Data Analytics Algorithms

Main Article Content

Ekarat Rattagan

Abstract

     Today's world is overwhelmed with the stream of data generated by IoT sensors, Smartphone applications, E-commerce transactions, etc. Data streaming and real-time analytics tools are necessary to apply for various purposes such as financial fraud detection, recommended products, or disaster warning systems. Existing real-time data analytics tools such as StreamDM, Scikit-multiflow, or Massive Online Analysis (MOA) play a significant role in this field. There is, however, still a lack of well-comparisons among streaming algorithms in these tools. In this paper, we aim to study and compare the performance of the streaming algorithms provided by Scikit-multiflow, one of the most popular tools. In the experiment, we compare various algorithms on classification and regression problems in terms of accuracy, model size, memory, etc. The synthesized and real-world datasets are both employed for the experiment. The experimental results illustrate that the Hoeffding-Tree algorithm shows the best performance among other algorithms.

Article Details

Section
Research Article

References

A. Kejariwal, S. Kulkarni, and K. Ramasamy, “Real time analytics: Algorithms and systems,” VLDB Endowment, vol. 8, no. 12, pp. 2040–2041, Aug. 2015.

A. Bifet, G. Holmes, R. Kirkby, and B. Pfahringer, “MOA: Massive online analysis,” Journal of Machine Learning Research, vol. 11, pp. 1601–1604, 2010.

J. Montiel, J. Read, A. Bifet, and T. Abdessalem, “Scikit-multiflow: A multi-output streaming framework,” Journal of Machine Learning Research, vol. 19, pp. 2915–2914, 2018.

A. Bifet, S. Maniu, J. Qian, G. Tian, C. He, and W. Fan, “Stream DM: Advanced data mining in spark streaming,” in Proc. IEEE Int. Conf. Data Mining Workshop (ICDMW), Atlantic City, NJ, USA, Nov. 14–17, 2015, pp. 1608–1611.

J. Gama, I. Zliobaite, A. Bifet, M. Pechenizkiy, and A. Bouchachia, “A survey on concept drift adaptation,” ACM Computing Surveys, vol. 46, no. 4, pp. 1–37, Apr. 2014.

A. P. Dawid, “Present position and potential developments: Some personal views statistical theory the prequential approach,” Journal of the Royal Statistical Society, Series A, vol. 147, no. 2, pp. 278–290, 1984.

J. Cohen, “A coefficient of agreement for nominal scales,” Educational and Psychological Measurement, vol. 20, no. 1, pp. 37–46, Apr. 1960.

R. Agrawal, T. Imielinski, and A. Swami, “Database Mining: A performance perspective,” IEEE Trans. Knowledge and Data Engineering, vol. 5, no. 6, pp. 914–925, Dec. 1993.