N50 is too short in de novo assembly
Entering edit mode
5 weeks ago
Takuma ▴ 10

Hello, I am freshman of bioinformatics!

I got illumina short reads (2×150bp) of a beetle, in which reference genome doesn't exist. Pleas see below.
Total sequences after trimming by fastp are about 320,000,000×2, and the genome size is estimated as about 530Mbp by kmergenie. Then I think the coverage is 150bp×320,000,000×2 / 530M=about 180

Now, I am working de novo assembly by platanus with some options ( -u 0.2 -s 3 -d 0.3). But N50 is too short, 992bp in contig and 5329bp in scaffold.

I think this species have highly heterozygous. Should I increase u value, for example -u 0.3? Do you have any ideas to improve N50?

assembly N50 genome • 165 views
Entering edit mode
5 weeks ago

Different assemblers may perform radically better/worse depending on parameter settings. So try different parameters/tools.

But, at the same time, it is also possible that your data is biased, not fragmented quite right, which too would lead to small contigs/scaffolds.

Then contamination with other genomes can lead to loss of coverage in critical areas.

Map your reads to the closest relative, investigate the alignments.


Login before adding your answer.

Traffic: 1839 users visited in the last hour
Help About
Access RSS

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6