Hi there,
I have a very fragmented reference genome that I want to visualize (with jbrowse or similar)
This is the paper for the genome: link
Here the genome can be found: link2
Here are the quast statistics:
########
QUAST Results
########
All statistics are based on contigs of size >= 500 bp, unless otherwise noted (e.g., "# contigs (>= 0 bp)" and "Total length (>= 0 bp)" include all contigs).
Assembly                    Abal.1_1   
# contigs (>= 0 bp)         37192295   
# contigs (>= 1000 bp)      1276678    
# contigs (>= 5000 bp)      529013     
# contigs (>= 10000 bp)     343016     
# contigs (>= 25000 bp)     145508     
# contigs (>= 50000 bp)     46234      
Total length (>= 0 bp)      18167382048
Total length (>= 1000 bp)   13017811908
Total length (>= 5000 bp)   11361640463
Total length (>= 10000 bp)  10034318481
Total length (>= 25000 bp)  6872368770 
Total length (>= 50000 bp)  3406852776 
# contigs                   1887964    
Largest contig              297427     
Total length                13450974050
GC (%)                      38.76      
N50                         25814      
N75                         9780       
L50                         139726     
L75                         348468     
# N's per 100 kbp           1703.76  
Any help how I can wrangle this beast into something presentable would be helpful.
I tried a cutoff to get rid of the contigs < 1000bp wich helped, but is there a way to rescaffold or similar?
I know plant genomes are messed up, but more than 3 billion contigs longer than 50kb? 18 billion contings in total?? That would put an estimate of genome size in the order of 1e14 base pairs or more. Definitely not a great assembly, if I'm not missing something important.
edit: the paper actually mentions 37 million scaffolds, for a total of 18 Gb, so maybe it would be better to start from there? (that is, scaffolds instead of contigs)