Question: Blast unigenes with set of protein sequences
0
gravatar for Kurban
2.8 years ago by
Kurban170
china/Urumqi/xinjiang academy of animal scinces
Kurban170 wrote:

Hello guys, I have more than 10,000 de novo assembled unigenes from RNA-seq, and blasted them against 95 protein sequences from another insect species get 454 blast results. But their similarity range is about 20%-100%; e-value range 8.00E-06 - 0. When I want to select the probable homologous from these blast results what should be the cut-offs for similarity and e-value?
Thanks in advance.

ADD COMMENTlink written 2.8 years ago by Kurban170

This question comes up often and there is no defined cutoff that designates a homolog. A gene is a homolog or it is not. On the other hand similarity is expressed in %. A sequence could still be homologous (with a low % similarity) if it is evolutionarily far apart. If your insects species are closely related then 20% similarily may be low but if they are not then 20% could still be an important data point.

As you are well aware blast E-values are dependent on size of the database which in this case is very small. Was there a reason to only select those 95 proteins? 454 genes that you have a blast result are similar to some extent with your target gene set and you would need to examine the entire lot to see if you can remove some redundancy.

ADD REPLYlink modified 2.8 years ago • written 2.8 years ago by genomax65k
Please log in to add an answer.

Help
Access

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.
Powered by Biostar version 2.3.0
Traffic: 1236 users visited in the last hour