Question: Viral de-novo Variant Call
0
gravatar for Ribas
18 months ago by
Ribas0
Ribas0 wrote:

Hi BioStars,

Here is my problem. After get a draft genome (viral) through de novo assembly, I am looking for the mutations. In particular I am looking for variants respect to a specific viral reference genome (let s call it X), which can be different than the one assembled.

Here is my current approach:

  1. Get DRAFT GENOME from de novo assembly;
  2. Align DRAFT GENOME and specific reference X with MAFFT;
  3. Now I have DRAFT GENOME and reference X with same coordinates;
  4. Maps reads on the DRAFT GENOME;
  5. Variant Call on the obtained bam, using X as bcftools reference.

Am I overthinking?

Many thanks.

snp assembly • 787 views
ADD COMMENTlink modified 18 months ago • written 18 months ago by Ribas0

Overthinking what exactly? You stated a problem statement and an approach, what do you think you're overthinking about?

ADD REPLYlink written 18 months ago by RamRS27k

Maybe there is another way to solve it. I think the problem of different SNP coordinates is common in de novo assembly.

ADD REPLYlink written 18 months ago by Ribas0

Why don't you just align your reads to the reference genome and call variant from that?

ADD REPLYlink written 18 months ago by WouterDeCoster44k

The reference genome can be quite different. This virus has many genotypes and subtypes. However the mutations published are referred to a specific one.

ADD REPLYlink written 18 months ago by Ribas0
1

If you want to figure out the location of the mutations then

a) check how similar/different your assembled genome is compared to the published genome

b) you could also run basic synteny analysis between the genomes to identify major coordinate differences between these genomes.

ADD REPLYlink modified 18 months ago • written 18 months ago by Sej Modha4.7k

Thanks for your reply.

Then my approach is similar to point a).

REFERENCE:  ...GTCACACTGG...
DRAFT:      ..................GTCCCACTGG...

After alignment:

REFERENCE:  ...GTCACACTGG...
DRAFT:      ...GTCCCAC-GG...

Finally Variant Call:

REFERENCE:  ...GTCACACTGG...
DRAFT:      ...GTCCCAC-GG...
Variant_is_here___^
ADD REPLYlink modified 7 months ago by RamRS27k • written 18 months ago by Ribas0

@Ribas I am quite interested in your approach. You are going thru the trouble of denovo assembly before variant calling to maximize mappability, correct? To a similar aim, I think another approach I have seen is to do a first round of mapping, generate a consensus sequence, and then remap to the consensus sequence.

Do you find that there is value in this extra step of denovo assembly or consensus prior to mapping? How do you handle insertions in your own draft denovo assembly when you align to reference (draft would no longer be on the same coordinate system as reference)? Have you only tried bcftools or also freebayes with the --pooled-continuous flag, or possibly LoFreq, SiNPle and the like?

ADD REPLYlink written 7 months ago by roberto.spreafico20

As a further comment, I think the main driver to choose between denovo assembly vs initial mapping followed by consensus is whether you start with a pure virus (in which case denovo assembly seems reasonable) or a mixture/non-clonal population (in which case denovo assembly might be messy, whereas mapping + consensus could handle better).

I would still be interested in your take on this, the handling of insertios, and the choice of variant caller.

ADD REPLYlink written 7 months ago by roberto.spreafico20
Please log in to add an answer.

Help
Access

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.
Powered by Biostar version 2.3.0
Traffic: 1759 users visited in the last hour