Viral de-novo Variant Call
0
0
Entering edit mode
2.5 years ago
Ribas • 0

Hi BioStars,

Here is my problem. After get a draft genome (viral) through de novo assembly, I am looking for the mutations. In particular I am looking for variants respect to a specific viral reference genome (let s call it X), which can be different than the one assembled.

Here is my current approach:

1. Get DRAFT GENOME from de novo assembly;
2. Align DRAFT GENOME and specific reference X with MAFFT;
3. Now I have DRAFT GENOME and reference X with same coordinates;
4. Maps reads on the DRAFT GENOME;
5. Variant Call on the obtained bam, using X as bcftools reference.

Am I overthinking?

Many thanks.

SNP assembly • 1.1k views
0
Entering edit mode

Overthinking what exactly? You stated a problem statement and an approach, what do you think you're overthinking about?

0
Entering edit mode

Maybe there is another way to solve it. I think the problem of different SNP coordinates is common in de novo assembly.

0
Entering edit mode

Why don't you just align your reads to the reference genome and call variant from that?

0
Entering edit mode

The reference genome can be quite different. This virus has many genotypes and subtypes. However the mutations published are referred to a specific one.

1
Entering edit mode

If you want to figure out the location of the mutations then

a) check how similar/different your assembled genome is compared to the published genome

b) you could also run basic synteny analysis between the genomes to identify major coordinate differences between these genomes.

0
Entering edit mode

Then my approach is similar to point a).

REFERENCE:  ...GTCACACTGG...
DRAFT:      ..................GTCCCACTGG...

After alignment:

REFERENCE:  ...GTCACACTGG...
DRAFT:      ...GTCCCAC-GG...

Finally Variant Call:

REFERENCE:  ...GTCACACTGG...
DRAFT:      ...GTCCCAC-GG...
Variant_is_here___^

0
Entering edit mode

@Ribas I am quite interested in your approach. You are going thru the trouble of denovo assembly before variant calling to maximize mappability, correct? To a similar aim, I think another approach I have seen is to do a first round of mapping, generate a consensus sequence, and then remap to the consensus sequence.

Do you find that there is value in this extra step of denovo assembly or consensus prior to mapping? How do you handle insertions in your own draft denovo assembly when you align to reference (draft would no longer be on the same coordinate system as reference)? Have you only tried bcftools or also freebayes with the --pooled-continuous flag, or possibly LoFreq, SiNPle and the like?

0
Entering edit mode

As a further comment, I think the main driver to choose between denovo assembly vs initial mapping followed by consensus is whether you start with a pure virus (in which case denovo assembly seems reasonable) or a mixture/non-clonal population (in which case denovo assembly might be messy, whereas mapping + consensus could handle better).

I would still be interested in your take on this, the handling of insertios, and the choice of variant caller.