Question

blastn of reference genome CDS to subject genome gives range of hits, which do not translate into protein

0

Entering edit mode

5.3 years ago

VBer ▴ 210

I have a consensus genome created by incorporating only biallelic SNPs into the reference genome. I want to get the protein sequence of a particular gene from my consensus genome.

I tried using the reference gene CDS taken from NCBI through blastn to do this. I got a single hit, spanning multiple ranges. I concatenated all the aligned nucleotides from the consensus and tried to translate them but the reference protein is not found in its entirety in any frame. The reference protein is split across several frames, something I did not expect, because there are only SNPs present in the consensus.

Any idea why this is happening and solutions to get the protein sequence?

BLAST • 1.6k views

ADD COMMENT • link 5.3 years ago by VBer ▴ 210

score 3 · Accepted Answer · 2020-03-28

3

Entering edit mode

5.3 years ago

lieven.sterck 15k

blast could work but might not be accurate enough, I'm afraid

perhaps give a real mapping tool a try, stuff like gmap or EST2genome or such (they take correct splicing into account, something that blast will not do)

in a genomic context yes the protein ( or more accurate the CDS) is in most cases 'split' over different frames. However when you concatenate them together it should revert back to a single reading frame (unless some of the SNPs you introduced causes frameshifts and/or premature stop codons)

ADD COMMENT • link 5.3 years ago by lieven.sterck 15k

1

Entering edit mode

Thanks, gmap worked like a charm!

ADD REPLY • link 5.3 years ago by VBer ▴ 210

0

Entering edit mode

Frameshift mutations are not possible as I did not include single nucleotide insertions, just SNPs. So perhaps I might have premature stop codons. I will try the aligners you have suggested, and perhaps look at the number of mutations too, and get back. Thanks!

ADD REPLY • link 5.3 years ago by VBer ▴ 210

0

Entering edit mode

Frameshift mutations are not possible as I did not include single nucleotide insertions, just SNPs.

correct you are.

then it's likely because blast does not provided you a correct gene structure (not a surprise neither, that's not it's goal). Yes give the gene mappers a try and see what that gives.

an alternative to this could be to transfer the annotation of your reference (given it has one) and then based on that extract your protein sequence.

ADD REPLY • link 5.3 years ago by lieven.sterck 15k

0

Entering edit mode

For the latter, I only know RATT. Do you know any other tools?

ADD REPLY • link 5.3 years ago by VBer ▴ 210

0

Entering edit mode

was also thinking of that one indeed.

there is also 'liftOver' from the ALLmaps package if I remember well.

ADD REPLY • link 5.3 years ago by lieven.sterck 15k