Question

If I've identified the gene via BLAST, can I translate it to protein and assume its correct?

0

Entering edit mode

4.0 years ago

christinejgu • 0

Hi, I am trying to identify the S1/S2 (furin-like) cleavage site of SARS-CoV-2. I performed a local blast using my fasta file with whole genomes downloaded from GISAID with the S gene of the ref seq of SARS-COV-2 from GenBank. Then I extracted out the S gene from each genome based on the positions given by my blast results. If I extract the S gene based on these positions, can I translate it using ExPASy or MegaX and assume it's correct? Are there any other considerations I need to think of? Subsequently, I plan to use ProP to determine putative furin-like cleavage sites.

Any thoughts or advice would be greatly appreciated! :)

genome gene protein virus • 964 views

ADD COMMENT • link updated 4.0 years ago by lieven.sterck 15k • written 4.0 years ago by christinejgu • 0

0

Entering edit mode

Why not just do a blastp or blastx and return protein results directly?

ADD REPLY • link 4.0 years ago by Joe 21k

0

Entering edit mode

Because there are not enough results :/. Pretty much all of the sequences are being deposited to GISAID.

ADD REPLY • link 4.0 years ago by christinejgu • 0

0

Entering edit mode

But you are already using blast? Just make a local blast database if all the genomes you download?

ADD REPLY • link 4.0 years ago by Joe 21k

0

Entering edit mode

I did do that! My query were all of my genome sequences and my subject was the spike gene. Would you suggest I do a blastx of the genes I extracted?

ADD REPLY • link 4.0 years ago by christinejgu • 0

score 2 · Answer 1 · 2020-04-09

2

Entering edit mode

4.0 years ago

lieven.sterck 15k

short answer: NO

well, unless you're really lucky it will in most cases not lead to a valid protein when you just take the blast hit and try to translate it. This is because is not mend to do this. It will report alignable regions given the parameter settings. However this does not necessarily lead to a correct gene structure.

There are exceptions though: if the gene is very well conserved it can be that blast aligns the full sequence but as said, that's not a guarantee. (an other more impacting issue is the exon-intron issue, but that should not be a factor here as you work with viral sequences).

An alternative approach could be to use a gene aligner such as gmap, est2genome, ... or similar tools. They are better suited to align full sequences to a genome and thus give you a better chance to get a full correct gene structure. (since those a much less quick/efficient than blast it might be clever to first filter the genomes for potential regions, extract those and then align the query CDS again with the above mentioned tools )

ADD COMMENT • link 4.0 years ago by lieven.sterck 15k

0

Entering edit mode

So would you suggest to visualize the alignment, extract, then align again? Hopefully, I understood that correctly.

ADD REPLY • link 4.0 years ago by christinejgu • 0

0

Entering edit mode

that could work as well yes.

I was rather thinking along the following lines: do a blast, parse the start/Stop of the blast hit regions, extract those regions from the genome(s) , do alignment on those subsequences with your CDS input query (with eg. gmap, GenomeThreader, ...) on those subsequences.

If you are not too impatient you can actually also run the CDS alignment with the above mentioned software on the whole genome as well.

ADD REPLY • link 4.0 years ago by lieven.sterck 15k

0

Entering edit mode

That's a great idea. I will go ahead and try out those options. Would you suggest one application over the other?