Do we still really need to remove the sequence (Protein or Genome) redundancy when using deep learning approaches to construct prediction models?
0
0
Entering edit mode
5.9 years ago
kurdt325 • 0

Removing sequence redundancy is a crucial step for protein or genome sequence related analysis in bioinformatics, especially using traditional machine learning methods, such as SVM, random forest, decision tree etc. By removing the sequence redundancy, the dataset can be clean and reliable for the model to catch the primary classification boundaries, which is also a great way to make the dataset smaller to improve the time cost in model training. Another major benefit is to avoid the model overfitting. Removing sequence redundancy is quite important to the computational sequence analysis.

But when this meet the deep learning, should we rethink this problem from scratch?

Firstly, deep learning models are more complex than traditional machine learning methods, large-scale dataset is required for the model training. For this reason, image-based deep learning model training usually use the image data augmentation (rotation, shift, adding noise) to generate more images for the model training. Within the sequence analysis, do we still need to remove the naturally existed sequences?

Secondly, deep learning approach has many techniques to avoid the model overfitting, such as using dropout, batch normalisation layer, pooling layer etc.

So, is it really necessary to remove the sequence (Protein or Genome) redundancy when using deep learning approaches to construct prediction models?

sequence-redundancy • 604 views
ADD COMMENT

Login before adding your answer.

Traffic: 2087 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6