N50 is too short in de novo assembly
5 weeks ago
Takuma ▴ 10

Hello, I am freshman of bioinformatics!

I got illumina short reads (2×150bp) of a beetle, in which reference genome doesn't exist. Pleas see below.
Total sequences after trimming by fastp are about 320,000,000×2, and the genome size is estimated as about 530Mbp by kmergenie. Then I think the coverage is 150bp×320,000,000×2 / 530M=about 180

Now, I am working de novo assembly by platanus with some options ( -u 0.2 -s 3 -d 0.3). But N50 is too short, 992bp in contig and 5329bp in scaffold.

I think this species have highly heterozygous. Should I increase u value, for example -u 0.3? Do you have any ideas to improve N50?

assembly N50 genome • 165 views
5 weeks ago

Different assemblers may perform radically better/worse depending on parameter settings. So try different parameters/tools.

But, at the same time, it is also possible that your data is biased, not fragmented quite right, which too would lead to small contigs/scaffolds.

Then contamination with other genomes can lead to loss of coverage in critical areas.

Map your reads to the closest relative, investigate the alignments.


