# Collaborative Learning of Estimation of Distribution Algorithm for RNA secondary structure prediction

## Main Article Content

## Abstract

Estimation of distribution algorithms (EDAs) are successfully applied in the fields of bioinformatics for tasks such as gene structure analysis, protein structure prediction, and RNA secondary structure prediction. This paper proposes a new method, namely collaborative learning of estimation of distribution algorithms, or Co-EDAs, based on an estimation of distribution algorithm for RNA secondary structure prediction using a single RNA sequence as input. The proposed method consists of two EDAs with minimum free energy objective. The Co-EDAs use both good and poor solutions to improve the algorithm’s to search throughout the search space. Using information from poor solutions can indicate which area is unappealing to explore when searching with high-dimensional data. The Co-EDAs method was tested with 750 known RNA structures from RNA STRAND v2.0. That database includes data with more than 14 RNA types. The proposed method was compared to three prediction programs that are based on dynamic programming algorithms called Mfold, RNAfold, and RNAstructure. These programs are available as services on web servers. The results on average show that the Co-EDAs yields approximately 6% better accuracy than those competitors in all metrics.

## Article Details

*ECTI-CIT*, vol. 14, no. 1, pp. 92-102, Apr. 2020.

This work is licensed under a Creative Commons Attribution-NonCommercial-NoDerivatives 4.0 International License.

## References

[2] E.P. Nawrocki, and S.R. Eddy, “Infernal 1.1: 100-fold faster RNA homology searches,” Bioinformatics,

29(22),pp.2933-2935,2013.

[3] J. Zuber, B.J. Cabral, I. McFadyen, D.M. Mauger, and D.H. Mathews, “Analysis of RNA nearest neighbor parameters reveals interdependencies and quantifies the uncertainty in RNA secondary structure prediction.” Rna, 24(11), pp.1568-1582, 2018.

[4] I.K. Oluoch, A. Akalin, Y. Vural, and Y. Canbay, “A review on RNA secondary structure prediction algorithms,” In 2018 International Congress on Big Data, Deep Learning and Fighting Cyber Terrorism (IBIGDELFT), pp. 18-23,2018.

[5] L. Wang, Y. Liu, X. Zhong, H. Liu, C. Lu, C. Li, and H. Zhang, “DMFold: A novel method to predict RNA secondary structure with pseudoknots based on deep learning and improved base pair maximization principle,” Frontiers in genetics, 10, p.143, 2019.

[6] H. Zhang, C. Zhang, Z. Li, C. Li, X. Wei, B. Zhang, and Y. Liu, “A new method of RNA secondary structure prediction based on convolutional neural network and dynamic programming,” Frontiers in genetics, 10, 2019.

[7] D.H. Mathews, and D.H. Turner, “Prediction of RNA secondary structure by free energy mini-mization,” Current opinion in structural biology, 16(3), pp.270-278, 2006.

[8] Z.J. Lu, J.W. Gloor, and D.H. Mathews, “Improved RNA secondary structure prediction by maximizing expected pair accuracy,” Rna, 15(10), pp.1805-1813, 2009.

[9] L. Quan, L. Cai, Y. Chen, J. Mei, X. Sun, and Q. Lyu, “Developing parallel ant colonies filtered by deep learned constrains for predicting RNA secondary structure with pseudo-knots,” Neurocomputing, 2019.

[10] S. Poznanovic, F. Barrera-Cruz, A. Kirkpatrick, M. Ielusic, and C. Heitsch, “The challenge of RNA branch-

ing prediction: a parametric analysis of multiloop initiation under thermodynamic optimization,”

bioRxiv, 2020.

[11] R. Nussinov, G. Pieczenik, J.R. Griggs, and D.J. Kleitman, “Algorithms for loop matchings,” SIAM Journal on Applied mathematics, 35(1), pp.68-82, 1978.

[12] M. Zuker, and P. Stiegler, “Optimal computer folding of large RNA sequences using thermodynamics and auxiliary information,” Nucleic acids research, 9(1), pp.133-148, 1981.

[13] A. El Fatmi, A. Chentoufi, M.A. Bekri, S. Benhlima, and M. Sabbane, “A heuristic algorithm for RNA secondary structure based on genetic algorithm,” In 2017 Intelligent Systems and Computer Vision (ISCV), IEEE, pp. 1-7, 2017.

[14] K. Zhang, and Y. Lv, “A Multiobjective RNA Secondary Structure Prediction Algorithm Based on NSGAII,” In 2018 IEEE SmartWorld, Ubiquitous Intelligence & Computing, Advanced & Trusted Computing, Scalable Computing & Communications, Cloud & Big Data Computing, Internet of People and Smart City Innovation, IEEE, pp. 1450-1454, 2018.

[15] R. Backofen, D. Tsur, S. Zakov, and M. Ziv-Ukelson, “Sparse RNA folding: Time and space efficient algorithms.” Journal of Discrete Algorithms, 9(1), pp.12-31, 2011.

[16] M. Zuker, “Mfold web server for nucleic acid folding and hybridization prediction,” Nucleic acids research,

31(13), pp.3406-3415, 2003.

[17] R. Lorenz, S.H. Bernhart, C.H. Zu Siederdissen, H. Tafer, C. Flamm, P.F. Stadler, and I.L. Hofacker, “ViennaRNA Package 2.0,” Algorithms for molecular biology, 6(1), p.26, 2011.

[18] S. Bellaousov, J.S. Reuter, M.G. Seetin, and D.H. Mathews, “RNAstructure: web servers for RNA secondary structure prediction and analysis,” Nucleic acids research, 41(W1), pp. W471-W474, 2013.

[19] T. Akutsu, “Dynamic programming algorithms for RNA secondary structure prediction with pseudo-

knots,” Discrete Applied Mathematics, 104(1-3), pp.45-62,2000.

[20] S. Takitou, and A. Taneda, “Ant colony optimization for predicting RNA folding pathways,” Computational biology and chemistry, 83, p.107118, 2019.

[21] K. Wiese, A. Deschenes, and A. Hendriks, “RnaPredict—an evolutionary algorithm for RNA secondary structure prediction,” IEEE/ACM Transactions on Computational Biology and Bioinformatics, 5(1), pp.25-41, 2008.

[22] P. Grypma, and H.H. Tsang, “SARNA-Predict: Using adaptive annealing schedule and inversion mutation operator for RNA secondary structure prediction,” In 2014 IEEE Symposium on Computational Intelligence in Multi-Criteria Decision-Making (MCDM), pp. 150-156, 2014.

[23] Z. Kai, W. Yuting, L. Yulin, L. Jun, and H. Juanjuan, “An efficient simulated annealing algorithm for the RNA secondary structure prediction with Pseudoknots,” BMC genomics, 20(13), pp.1-13, 2019.

[24] D.H. Mathews, “How to benchmark RNA secondary structure prediction accuracy,” Methods, 2019.

[25] R. Santana, P. Larranaga, and J.A. Lozano, “Protein folding in 2-dimensional lattices with estimation of distribution algorithms,” In International Symposium on Biological and Medical Data Analysis, Springer, Berlin, Heidelberg, pp. 388-398, 2004.

[26] R. Santana, P. Larrañaga, and J.A. Lozano, “Protein folding in simplified models with estimation of distribution algorithms,” IEEE transactions on Evolutionary Computation, 12(4), pp.418-438, 2008.

[27] R. Santana, P. Larrañaga, and J.A. Lozano, “Combining variable neighborhood search and estimation of distribution algorithms in the protein side chain placement problem,” Journal of Heuristics,

14(5), pp.519-547, 2008.

[28] M.E. Alden, “MARLEDA: effective distribution estimation through Markov random fields,” (Doctoral dissertation), 2007.

[29] S. Srikamdee, W. Wattanapornprom, and P. Chongstitvatana, “RNA secondary structure prediction with coincidence algorithm,” In 2016 16th International Symposium on Communications and Information Technologies (ISCIT), IEEE, pp. 686-690, 2016.

[30] S.H. Chen, M.C. Chen, P.C. Chang, Q. Zhang, and Y.M. Chen, “Guidelines for developing effective estimation of distribution algorithms in solving single machine scheduling problems,” Expert Systems with Applications, 37(9), pp.6441-6451, 2010.

[31] K. Wang, S.H. Choi, and H. Lu, “A hybrid estimation of distribution algorithm for simulation-based scheduling in a stochastic permutation flowshop,” Computers & Industrial Engineering, 90, pp.186-196, 2015.

[32] H. Liu, L. Gao, and Q. Pan, “A hybrid particle swarm optimization with estimation of distribution algorithm for solving permutation flowshop scheduling problem,” Expert Systems with Applications, 38(4), pp.4348-4360, 2011.

[33] Y.R.Tzeng, C.L. Chen, and C.L. Chen, “A hybrid EDA with ACS for solving permutation flow shop scheduling,” The international journal of advanced manufacturing technology, 60(9-12), pp.1139-1147, 2012.

[34] M. Andronescu, V. Bereg, H.H. Hoos, and A. Condon, “RNA STRAND: the RNA secondary structure and statistical analysis database,” BMC bioinformatics, 9(1), p.340, 2008.

[35] D.H. Turner, and D.H. Mathews, “NNDB: the nearest neighbor parameter database for predicting stability of nucleic acid secondary structure,” Nucleic acids research, 38(suppl_1), pp. D280-D282, 2010.

[36] S. Montaseri, M. Ganjtabesh, and F. Zare-Mirakabad, “Evolutionary algorithm for RNA secondary structure prediction based on simulated SHAPE data,” PloS one, 11(11), 2016.