Question

genome annotation [sstart] [send] - how to get protein sequence from gene

0

Entering edit mode

13 months ago

danfarkas • 0

Hi,

I have a bacterial chromosome. I am struggling to understand how I can get protein sequences from genes annotated in the following way:

bacterial_chromosome_9_515

where sstart = 9 and send = 515

This is actually fine, as I can index the forward sequence using biopython:

faa = fasta[sstart:send].seq.translate(table=11)

However, when a gene is annotated in the reverse way, where sstart > send:

bacterial_chromosome_2423_1891

I am unsure how to get the corresponding protein sequence.

It would be much appreciated if someone could explain this.

Many thanks,

Dan

genome annotation • 502 views

ADD COMMENT • link 13 months ago by danfarkas • 0

score 0 · Answer 1 · 2023-03-28

0

Entering edit mode

13 months ago

shenwei356 8.4k

It means that the gene is on the negative/minus strand. So you need to compute the reverse complementary sequence before translating it to amino acids. Something like this

if  sstart > send:
    fasta[send:sstart].seq.revcom().translate(table=11)
else:
    fasta[sstart:send].seq.translate(table=11)

ADD COMMENT • link 13 months ago by shenwei356 8.4k

0

Entering edit mode

Hi shenwei356,

Thanks for the clarification. That makes sense.

I also realised that the reason I was confused, is because I need to also figure out which frame the amino acid sequence is transcribed in, as the genes may not start with the right frame. Can you suggest what may be the best way to do this?

Daniel

ADD REPLY • link 13 months ago by danfarkas • 0