Question: How to get consensus FASTA sequence from GFA assembly graph?
1
gravatar for shenwei356
3 months ago by
shenwei3564.5k
China
shenwei3564.5k wrote:

Hi all,

I want to assemble a region of a bacterium genome (~10kb). The sequenced dna is from a single species, cultured from a single clone.

By now, I've mapped the reads (Illuminma PE150) to reference, assembled using spades (SE mode) with reads (not all paired) retrieved by samtools, and the assembly graph generated using Bandage using with .gfa file is below:

I've manually checked the "bubles", and almost of them are very similar (99%) except of 1-2bp mismatches. And the depths of the two paths of the "bubles' are almost half to half, so they may not be sequence error.

By the way, the sequenced dna is from a single species, cultured from a single clone.

So, how can I get consensus sequence from this GFA graph? It's kind of like diploid genome, but I'm not familiar with this.

Thank you in advance.

gfa consensus • 331 views
ADD COMMENTlink modified 3 months ago • written 3 months ago by shenwei3564.5k
1

This works for miniasm GFA output : http://seqanswers.com/forums/showthread.php?t=64862
Not sure if it will work for bandage. Since bacteria are haploid those differences may represent sequencing errors or population differences?

ADD REPLYlink modified 3 months ago • written 3 months ago by genomax64k

no no, the answer just reformat GFA to fasta.

According to the depths, the two "alleles" are almost half to half, so they may not be sequence error.

By the way, the sequenced dna is from a single species, cultured from a single clone.

ADD REPLYlink written 3 months ago by shenwei3564.5k

Is bandage able to export all possible variations of the sequences (I assume that is what you are looking for)? You can't really get a consensus sequence if you have variations in each place where there is a bubble.

Sequenced DNA may be from one species but if the bacteria are under some sort of selection then perhaps you are observing various mutations being selected? What depth does half to half refer to?

ADD REPLYlink modified 3 months ago • written 3 months ago by genomax64k
0
gravatar for shenwei356
3 months ago by
shenwei3564.5k
China
shenwei3564.5k wrote:

I see, I have to manually generate the path, one way is:

  1. Choosing one path and export the sequence
  2. Possible two methods:
    1. Manually marking and editing the sites with alleles as degenerate bases, with help of the paths not chosen and exported in step 1.
    2. Mapping reads to sequence from step 1, and get consensus sequence from BAM files.
ADD COMMENTlink modified 3 months ago • written 3 months ago by shenwei3564.5k
Please log in to add an answer.

Help
Access

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.
Powered by Biostar version 2.3.0
Traffic: 1227 users visited in the last hour