Question: blastn of reference genome CDS to subject genome gives range of hits, which do not translate into protein
0
gravatar for Cookie-san
9 weeks ago by
Cookie-san90
Cookie-san90 wrote:

I have a consensus genome created by incorporating only biallelic SNPs into the reference genome. I want to get the protein sequence of a particular gene from my consensus genome.

I tried using the reference gene CDS taken from NCBI through blastn to do this. I got a single hit, spanning multiple ranges. I concatenated all the aligned nucleotides from the consensus and tried to translate them but the reference protein is not found in its entirety in any frame. The reference protein is split across several frames, something I did not expect, because there are only SNPs present in the consensus.

Any idea why this is happening and solutions to get the protein sequence?

blast • 131 views
ADD COMMENTlink written 9 weeks ago by Cookie-san90
3
gravatar for lieven.sterck
9 weeks ago by
lieven.sterck7.8k
VIB, Ghent, Belgium
lieven.sterck7.8k wrote:

blast could work but might not be accurate enough, I'm afraid

perhaps give a real mapping tool a try, stuff like gmap or EST2genome or such (they take correct splicing into account, something that blast will not do)

in a genomic context yes the protein ( or more accurate the CDS) is in most cases 'split' over different frames. However when you concatenate them together it should revert back to a single reading frame (unless some of the SNPs you introduced causes frameshifts and/or premature stop codons)

ADD COMMENTlink modified 9 weeks ago • written 9 weeks ago by lieven.sterck7.8k
1

Thanks, gmap worked like a charm!

ADD REPLYlink written 6 weeks ago by Cookie-san90

Frameshift mutations are not possible as I did not include single nucleotide insertions, just SNPs. So perhaps I might have premature stop codons. I will try the aligners you have suggested, and perhaps look at the number of mutations too, and get back. Thanks!

ADD REPLYlink written 9 weeks ago by Cookie-san90

Frameshift mutations are not possible as I did not include single nucleotide insertions, just SNPs.

correct you are.

then it's likely because blast does not provided you a correct gene structure (not a surprise neither, that's not it's goal). Yes give the gene mappers a try and see what that gives.

an alternative to this could be to transfer the annotation of your reference (given it has one) and then based on that extract your protein sequence.

ADD REPLYlink modified 9 weeks ago • written 9 weeks ago by lieven.sterck7.8k

For the latter, I only know RATT. Do you know any other tools?

ADD REPLYlink written 9 weeks ago by Cookie-san90

was also thinking of that one indeed.

there is also 'liftOver' from the ALLmaps package if I remember well.

ADD REPLYlink written 9 weeks ago by lieven.sterck7.8k
Please log in to add an answer.

Help
Access

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.
Powered by Biostar version 2.3.0
Traffic: 1669 users visited in the last hour