Better assembly results with lower depth !?
Entering edit mode
4 months ago
liorglic ▴ 870

I created two genome assemblies of a plant, based on short read data (Illumina PE). I used MEGAHIT in both cases with the same configuration. The only difference was that in one case I used 50x sequencing depth, and in the other I subsampled to 20x.
To my surprise, the 20x assembly ended up with slightly better stats: N50, total assembly size, and BUSCO score.
Can anyone help and suggest reasons for why this could happen? My understanding had always been that additional sequencing data can't harm the results, especially at relatively-low depth. I've found this paper, in which a similar phenomena was observed for bacterial genomes, but the reasons are not explained or discussed. Any ideas or relevant literature?


megahit assembly • 215 views

Login before adding your answer.

Traffic: 2786 users visited in the last hour
Help About
Access RSS

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6