Why is my genome assembly not aligning well?
2
0
Entering edit mode
7 months ago
eennadi ▴ 30

enter image description here

Hello, I am working on a genome assembly. I assembled the genome using SPADES. I used QUAST to assess the genome and got the result

All statistics are based on contigs of size >= 500 bp, unless otherwise noted (e.g., "# contigs (>= 0 bp)" and "Total length (>= 0 bp)" include all contigs).

Assembly                     a9_spades   
# contigs (>= 0 bp)          1456        
# contigs (>= 1000 bp)       88          
# contigs (>= 5000 bp)       54          
# contigs (>= 10000 bp)      43          
# contigs (>= 25000 bp)      31          
# contigs (>= 50000 bp)      30          
Total length (>= 0 bp)       12399629    
Total length (>= 1000 bp)    12172083    
Total length (>= 5000 bp)    12103054    
Total length (>= 10000 bp)   12028690    
Total length (>= 25000 bp)   11881196    
Total length (>= 50000 bp)   11850351    
# contigs                    130         
Largest contig               1281197     
Total length                 12200405    
Reference length             12338308    
GC (%)                       38.52       
Reference GC (%)             38.62       
N50                          572496      
NG50                         572496      
N90                          159566      
NG90                         138054      
auN                          590647.5    
auNG                         584046.0    
L50                          8           
LG50                         8           
L90                          23          
LG90                         24          
# misassemblies              13          
# misassembled contigs       9           
Misassembled contigs length  3587409     
# local misassemblies        9           
# scaffold gap ext. mis.     2           
# scaffold gap loc. mis.     7           
# unaligned mis. contigs     2           
# unaligned contigs          17 + 14 part
Unaligned length             69366       
Genome fraction (%)          98.277      
Duplication ratio            1.000       
# N's per 100 kbp            36.23       
# mismatches per 100 kbp     487.24      
# indels per 100 kbp         58.41       
Largest alignment            1281197     
Total aligned length         12120016    
NA50                         398669      
NGA50                        398669      
NA90                         131098      
NGA90                        124681      
auNA                         520628.5    
auNGA                        514809.6    
LA50                         9           
LGA50                        9           
LA90                         28          
LGA90                        29

However trying to align the genome sequence with the reference. I get a misaligned genome. What could be wrong?

alignment • 774 views
ADD COMMENT
0
Entering edit mode

Thanks so much. I sorted the dotplot and it worked. I have one more issue.

ADD REPLY
2
Entering edit mode
7 months ago
alex.zaccaron ▴ 410

Why you think it is misaligned? Your dot plot looks fine. Your assembly is a bit more fragmented than the reference, for example you got three main contigs matching the longest sequence in the reference.

ADD COMMENT
1
Entering edit mode
7 months ago
cmdcolin ★ 3.8k

I agree with alex.zaccaron that it looks basically fine.

If you want the dotplot to align to a diagonal better, you may able to "sort" or "diagonalize" the dotplot. You should tell us which tools you are using for visualization (i don't get why bioinformaticians are so secretive or not fortchcoming about what visualization tools they use...) so that people can help you better, but you appear to be using d-genies, which has a help page, see "Sort (11)" here https://dgenies.toulouse.inra.fr/documentation/result

this page also gives a description for creating and diagonalizing a dotplot in ggplot2 https://jmonlong.github.io/Hippocamplus/2017/09/19/mummerplots-with-ggplot2/

ADD COMMENT

Login before adding your answer.

Traffic: 1923 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6