Feature Selection Method for DNA-Binding Protein Identification
Abstract
DNA-binding proteins play a very important role in the structural composition of the DNA. In addition to that they regulate and effect various cellular processes like transcription, DNA replication, DNA recombination, repair and modification. The experimental methods used to identify DNA-binding proteins are expensive and time consuming and thus attracted researchers from computational field to address the problem. In this paper, we present iDNAProt-ES, a DNA binding protein prediction method that utilizes both sequence based evolutionary and structure based features of proteins to identify their DNA-binding functionality. We used recursive feature elimination to extract an optimal set of features and train them using support vector machine with linear kernel to select the final model. Our proposed method significantly outperforms the existing state-of-the-art predictors on standard benchmark dataset. The accuracy of the predictor is 90.18\% using jack knife test and 88.87\% using 10-fold cross validation on the benchmark dataset. The accuracy of the predictor on the independent dataset is 80.64\% which is also significantly improved than the state-of-the-art methods. iDNAProt-ES is a novel prediction method that uses evolutionary and structure based features. We believe the superior performance of iDNAProt-ES will motivate the researchers to use these set of features and the methods and also help the researchers interested in identification of DNA-binding proteins. iDNAProt-ES is readily available to be use as a web server from: http://brl.uiu.ac.bd/iDNAProt-ES/.
Collections
- M.Sc Thesis/Project [149]