Interpretation of .gfa
1
1
Entering edit mode
2.9 years ago
Rox ★ 1.4k

I love .gfa, but sometimes I have trouble to understand them.

I have used Flye with pacBio reads with defaults options to make a first shot at assemble a linear bacterial genome. The genome probably contains some plasmids or some phages sequences. Flye gave me the following .gfa :

enter image description here

The green, yellow and red edges have been merged into one scaffold of around 9 Mb (as expected). And the little blue edge is his own contig of 42 Kb.

In his assembly_infot.txt file, Flye report that the big scaffold is indeed non linear, and that the tiny one is circular :

#seq_name   length  cov.    circ.   repeat  mult.   alt_group   graph_path
scaffold_2  8926751 84  N   N   1   *   *,1,2,4,??,4,-3,-1,*
contig_4    42310   940 N   Y   12  *   4

I have a few questions that puzzles me about this graph :

  • Why does the .gfa connect all the edges into a circle if flye report that only one piece is circular not the other ?
  • The yellow edge is connected by the same end to both green and red... And if the mean coverage of the green and red is 70X, it is only 17X for the yellow edge. What could it mean ? I am very puzzled by the fact it is connected by the same end. Could it be Flye trying to circularize it ? Or a sort of SV ? I think the DNA provided comes from a single colony so I don't see how that could be a SV.
  • The blue edge has connection to itself, I imagine it is because of repetitions. But in the graph, this repetition is somehow connected to the others edges. So why was it split in the .fasta file in the end ?
flye bandage assembly • 1.5k views
ADD COMMENT
2
Entering edit mode
2.7 years ago
Rox ★ 1.4k

Oh, I forgot to update this post, but fenderglass, the author of flye, helped me a lot to understand this. You can follow the whole conversation here : https://github.com/fenderglass/Flye/issues/389

ADD COMMENT

Login before adding your answer.

Traffic: 2520 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6