I have assembled a bacterial genome using spades. The coverage among the contigs is very heterogeneous, while some have more than 30 fold of coverage, other have less than 5 fold. Also, there are a lot of contigs with a length less than 500 bp. I'm aware that I must filter contigs having low coverage and small length (they probably are spurious sequencing products), however what would be reasonable criteria for filtering them? I have found scripts that remove contigs having less than 500 pb and 2 fold coverage. I believe this setting is too permissive, since most of the contigs of my assembly significantly deviate from it.
As we can see, most of contigs have less than 1000 bp, Most of these contigs have less than 5 fold of coverage. My plan is to filter contigs having less than 1000 bp and 5 fold of coverage. After, I am going to check the completeness of the filtered genome sequence using checkm (contrasting to the unfiltered genome).