How to interpret genomescope to help remove duplicates
0
0
Entering edit mode
12 weeks ago

Hello everyone.

I am in the process of using PacBio HiFi reads to assemble a repetitive plant genome. I don't have a reference to work with unfortunately so I started by assembling my reads using hifiasm. BUSCO analysis revealed incredibly high duplication (>90%) so I used jellyfish and genomescope to take a look at my results. This is what I've gotten and I'm not really sure how to interpret the resulting graph. Is this graph saying that the first peak corresponds to the heterozygous k-mers and the second peak homozygous? Or am I interpreting this incorrectly. Thank you all for your helpenter image description here

assembly hifi genomescope • 518 views
ADD COMMENT
1
Entering edit mode

What is the expected genome size and ploidy for this organism? Can you post the busco results?

ADD REPLY
0
Entering edit mode

Thank you for helping me out. Other related species have genomes ranging between 400 and 700 Mb so the length given here by genomescope seems reasonable. The BUSCO results prior to purging are as follows:

 ***** Results: *****

    C:99.4%[S:1.5%,D:97.9%],F:0.4%,M:0.2%,n:1614,E:1.5%        
    1604    Complete BUSCOs (C)     (of which 24 contain internal stop codons)                 
    24      Complete and single-copy BUSCOs (S)        
    1580    Complete and duplicated BUSCOs (D)         
    6       Fragmented BUSCOs (F)                      
    4       Missing BUSCOs (M)                         
    1614    Total BUSCO groups searched 

    Assembly Statistics:
    8911    Number of scaffolds
    8911    Number of contigs
    2026206426      Total length
    0.000%  Percent gaps
    7 MB    Scaffold N50
    7 MB    Contigs N50
ADD REPLY

Login before adding your answer.

Traffic: 3514 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6