Question: Gene Annotation Using Known Proteins Or Est Sequences
8.0 years ago
Plantae wrote:

Hi, we have sequenced a new genome, now i want to annotate the genome using know proteins and ESTs from other species. Proteins or ESTs were blast against to our genome, but some of them got too many hits. My question is that - should i filter these blast hits by e-values? If so, how to reasonably setting these parameters?

gene annotation genome
8.0 years ago by Plantae
8.0 years ago
Barcelona, Spain
Darked89 wrote:

You may start with filtering out plant protein entries containing repetitive sequences. Check Pfam database for "plant transposon" i.e.:

GMAP may be better than blast for ESTs mapping (it is splice site aware).

For same species ESTs used for gene mapping try to use PASA.

8.0 years ago by Darked89
8.0 years ago
Travis wrote:

As mentioned above, you are probably better using a program that specifically accounts for the presence of introns. Exonerate is a good one and freely available.

When using ESTs I would recommend getting rid of anything with less than 90% identity.

Also, divide the length of your alignment by the length of the original sequence to get a % aligned value. Then remove anything with less than e.g. 90% of its length aligned.

Adjust the 90% if you seem to get too few hits.

8.0 years ago by Travis
