Question: tblastn extend beyond homology region
8 months ago
The university of Manchester, UK
I am performing a very simple task. I got a protein sequence, with conserved domains and I am mapping this protein and domains to a genome. I found the analogous domain. However I would like to get 100 amino acids downstream and upstream the domain. Because its a tblastn I am a bit stuck on how to do that properly to get everything on the same reading frame.

Any suggestion?

Thanks !

8 months ago
France, Paris
Are you trying to find the orthologous sequence from your protein in a genome ? If so, you can just extract your tblastn region with samtools faidx (like 100000bp upstream + 100000bp downstream) and then use exonerate to find the CDS of your gene on this genome, which will give you the exon sequences and also the intron borders.

