Question: If I've identified the gene via BLAST, can I translate it to protein and assume its correct?
0
gravatar for christinejgu
7 weeks ago by
christinejgu0 wrote:

Hi, I am trying to identify the S1/S2 (furin-like) cleavage site of SARS-CoV-2. I performed a local blast using my fasta file with whole genomes downloaded from GISAID with the S gene of the ref seq of SARS-COV-2 from GenBank. Then I extracted out the S gene from each genome based on the positions given by my blast results. If I extract the S gene based on these positions, can I translate it using ExPASy or MegaX and assume it's correct? Are there any other considerations I need to think of? Subsequently, I plan to use ProP to determine putative furin-like cleavage sites.

Any thoughts or advice would be greatly appreciated! :)

virus protein gene genome • 129 views
ADD COMMENTlink modified 7 weeks ago by lieven.sterck7.8k • written 7 weeks ago by christinejgu0

Why not just do a blastp or blastx and return protein results directly?

ADD REPLYlink written 7 weeks ago by Joe16k

Because there are not enough results :/. Pretty much all of the sequences are being deposited to GISAID.

ADD REPLYlink written 7 weeks ago by christinejgu0

But you are already using blast? Just make a local blast database if all the genomes you download?

ADD REPLYlink written 7 weeks ago by Joe16k

I did do that! My query were all of my genome sequences and my subject was the spike gene. Would you suggest I do a blastx of the genes I extracted?

ADD REPLYlink written 7 weeks ago by christinejgu0
2
gravatar for lieven.sterck
7 weeks ago by
lieven.sterck7.8k
VIB, Ghent, Belgium
lieven.sterck7.8k wrote:

short answer: NO

well, unless you're really lucky it will in most cases not lead to a valid protein when you just take the blast hit and try to translate it. This is because is not mend to do this. It will report alignable regions given the parameter settings. However this does not necessarily lead to a correct gene structure.

There are exceptions though: if the gene is very well conserved it can be that blast aligns the full sequence but as said, that's not a guarantee. (an other more impacting issue is the exon-intron issue, but that should not be a factor here as you work with viral sequences).

An alternative approach could be to use a gene aligner such as gmap, est2genome, ... or similar tools. They are better suited to align full sequences to a genome and thus give you a better chance to get a full correct gene structure. (since those a much less quick/efficient than blast it might be clever to first filter the genomes for potential regions, extract those and then align the query CDS again with the above mentioned tools )

ADD COMMENTlink written 7 weeks ago by lieven.sterck7.8k

So would you suggest to visualize the alignment, extract, then align again? Hopefully, I understood that correctly.

ADD REPLYlink written 7 weeks ago by christinejgu0

that could work as well yes.

I was rather thinking along the following lines: do a blast, parse the start/Stop of the blast hit regions, extract those regions from the genome(s) , do alignment on those subsequences with your CDS input query (with eg. gmap, GenomeThreader, ...) on those subsequences.

If you are not too impatient you can actually also run the CDS alignment with the above mentioned software on the whole genome as well.

ADD REPLYlink written 7 weeks ago by lieven.sterck7.8k

That's a great idea. I will go ahead and try out those options. Would you suggest one application over the other?

ADD REPLYlink written 7 weeks ago by christinejgu0

not really, fro this simple case any of them will do I assume.

ADD REPLYlink written 7 weeks ago by lieven.sterck7.8k
Please log in to add an answer.

Help
Access

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.
Powered by Biostar version 2.3.0
Traffic: 1243 users visited in the last hour