Gene Annotation Using Known Proteins Or Est Sequences
2
2
Entering edit mode
13.0 years ago
Plantae ▴ 390

Hi, we have sequenced a new genome, now i want to annotate the genome using know proteins and ESTs from other species. Proteins or ESTs were blast against to our genome, but some of them got too many hits. My question is that - should i filter these blast hits by e-values? If so, how to reasonably setting these parameters?

gene annotation genome • 3.3k views
ADD COMMENT
1
Entering edit mode
13.0 years ago
Darked89 4.6k

You may start with filtering out plant protein entries containing repetitive sequences. Check Pfam database for "plant transposon" i.e.: http://pfam.janelia.org/search/keyword?query=plant+transposon

GMAP may be better than blast for ESTs mapping (it is splice site aware).

For same species ESTs used for gene mapping try to use PASA.

ADD COMMENT
1
Entering edit mode
13.0 years ago
Travis ★ 2.8k

As mentioned above, you are probably better using a program that specifically accounts for the presence of introns. Exonerate is a good one and freely available.

When using ESTs I would recommend getting rid of anything with less than 90% identity.

Also, divide the length of your alignment by the length of the original sequence to get a % aligned value. Then remove anything with less than e.g. 90% of its length aligned.

Adjust the 90% if you seem to get too few hits.

ADD COMMENT

Login before adding your answer.

Traffic: 3449 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6