Classication Models Based-on Incremental Learning Algorithm and Feature Selection on Gene Expression Data

Main Article Content

Phayung Meesad
Sageemas Na Wichian
Unger Herwig
Patharawut Saengsiri

Abstract

This Gene expression data illustrates levels of genes that DNA encodes into the protein such as muscle or brain cells. However, some abnormal cells may evolve from unnatural expression levels. Therefore, finding a subset of informative gene would be beneficial to biologists because it can identify discriminative genes. Unfortunately, genes grow up rapidly
into the tens of thousands gene which make it difficult for classifying processes such as curse of dimensionality and misclassification problems. This paper proposed classification model based-on incremental learning algorithm and feature selection on gene expression data. Three feature selection methods: Correlation based Feature Selection (Cfs), Gain Ratio (GR), and Information Gain (Info) combined with Incremental Learning Algorithm based-on Mahalanobis Distance (ILM). Result of the experiment represented proposed models CfsILM, GRILM and InfoILM not only to reduce many dimensions from 2001, 7130 and 4026 into 26, 135, and 135 that save time-resource but also to improve accuracy rate 64.52%, 34.29%, and 8.33% into 90%, 97.14%, and 83.33% respectively. Particularly, CfsILM is more outstanding than other models on three public gene expression datasets.

Article Details

How to Cite
[1]
P. Meesad, S. N. Wichian, U. Herwig, and P. Saengsiri, “Classication Models Based-on Incremental Learning Algorithm and Feature Selection on Gene Expression Data”, ECTI-CIT Transactions, vol. 6, no. 1, pp. 40–47, Apr. 2016.
Section
Artificial Intelligence and Machine Learning (AI)