assembly: purge_dups removes too much sequence
0
0
Entering edit mode
7 months ago

I have tried out purge_dups and found results to be puzzling.

First, there seems to be a silent bug where contig fasta headers are ignored (symptom - bed files are empty).

Solution - use simple contig fasta headers, like sample_contig1 and not sample-complex-name_x1-_contig-1.

I analyzed plant genomes, which are repetitive, and found >80% of the contigs would be removed (700 mb reduced to 60 mb). This is surely too much.

I am using ONT and not Pacbio HiFi for my assemblies, so this could be one problem.

Has anyone optimized purge_dups for either plant genomes or nanopore ? It has over 850 citations and has been used on plant genomes widely before, yet there is no recommended parameter set for plants.

Thanks

purge_dups assembly nanopore repeats plants • 302 views
ADD COMMENT

Login before adding your answer.

Traffic: 1441 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6