de novo assembly and SNP discovery
1
0
Entering edit mode
7.3 years ago
rnaseq2017 • 0

Which NGS platform and sequencing depth are suitable for de novo assembly and SNP discovery in a non model fungus with 20 Mb of genome size? Is "Sequencing on Hiseq2500, 100bp PE; 100M reads (10Gb) "suitable ? Could I reduce sequencing depth to 30 M or 50M?

SNP • 2.0k views
ADD COMMENT
0
Entering edit mode

Are you sequencing two or more stains or comparing to an existing genome? Denovo and SNP detection don't usually go hand in hand

ADD REPLY
0
Entering edit mode

Actually, we kind of do that sometimes at JGI... the goal is generally to find out which strain of an organism, that is capable of metabolizing X, is the best at metabolizing X. Or something similar.

ADD REPLY
0
Entering edit mode

I didn't want to give an elaborate answer like you gave so I tried to narrow it down to one :)

ADD REPLY
3
Entering edit mode
7.3 years ago

Assembly and variant-calling are different. For assembly, you need higher depth and longer reads than for variant-calling, so you should separate your needs into two parts:

1) What kind of data do I need to assemble this organism?

2) What kind of data do I need to call variants on this organism?

I'll assume it's haploid, which simplifies things. I'd suggest at least 100x coverage for assembly. The HiSeq 2500 platform at 2x150bp is great - however, the MiSeq is better, as it offers longer reads (2x250) but it costs more. Since you only assemble once, if you are restricted to Illumina platforms, you should go with MiSeq 2x250 for assembly. MiSeq also allows 2x300bp sequencing, but Illumina has a history of producing corrupted 2x300 kits so I don't recommend that. The 2x250 kits seem to be good, and MiSeq is Illumina's highest-quality platform. For assembly, using an unamplified library is critical.

For subsequent variant-calling on lots of samples, read length is less important. So, just sequence at 2x150 on a HiSeq 2500 (which is Illumina's second-highest-quality platform), or 2x100 (only if it is substantially cheaper per bp). If your organism is haploid, 20x coverage is more than enough for an unamplified library.

If you want an optimal assembly, you should sequence at ~100x on PacBio. That can often yield a near-perfect assembly for genomes in this size range, and it will always be dramatically better than an assembly from Illumina reads. PacBio gives the best assemblies, period - and again, you only assemble once. This would probably be 2 SMRT cells, so, around $1000 for a near-perfect assembly. Definitely the best option! But Illumina is still better for variant-calling, at least for small variants like SNPs. PacBio is better for structural variations and phasing, but you don't need phasing with a haploid. So, I suggest PacBio at >=100x depth for assembly and Illumina HiSeq 2500 at 2x150 (or 2x100 if it is substantially cheaper) for variant-calling, at ~20x depth. Both non-PCR-amplified. If cost is a big issue you can also drop the coverage for variant-calling lower, down to 10x, assuming the fungus is haploid and the libraries are unamplified. Note that lower coverage requires longer reads; 20x with 100bp reads is strictly inferior to 20x with 150bp reads, but at 10x, 100bp reads are even more inferior.

ADD COMMENT
1
Entering edit mode

That's an answer to save for future reference. I have some more suggestions:

  1. If you would like to sequence 2X250 make sure that the insert size of your library is 550, not 350. 2X250 is available in rapid mode of HiSeq2500 as well.

  2. 2X300 is worthless, last 50 bp are low quality

  3. I highly recommend to add some long reads with PACBio/ Nanopore. If you run Nanopore make sure you get 2D reads.

ADD REPLY
0
Entering edit mode

Thanks for the reply; I was not aware HiSeq 2500 offered 2x250 now. I'll have to check it out and evaluate the quality.

Oh, and yes, you are absolutely correct that the ideal target insert size depends on the read length; I should have mentioned that. Nice to hear about the last 50bp of 2x300bp reads. I have, actually, seen 2x300bp reads of high quality, but I'm not sure if they still are obtainable, so I don't recommend them. I've also (more recently) seen 2x300bp reads that were basically unusable for any purpose.

ADD REPLY

Login before adding your answer.

Traffic: 2794 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6