Forum: Pacific Bio Long Reads vs Illumina Short Reads
gravatar for dk0319
18 days ago by
dk031920 wrote:

Is anyone who has worked with both Illumina and Pac Bio generated NGS data open to discuss their experience with the two platforms? Did you notice any clear strengths/weakness? Especially as it pertains to genome assembly and structural variant discovery and/or RNA-seq analysis (both DE and splice discovery).

Note: I am aware of the implied benefit of long vs short reads. Just really curious to hear first hand accounts

ADD COMMENTlink modified 18 days ago by h.mon32k • written 18 days ago by dk031920

Based on experience and working on both.
Disclaimer: This is my own paper:
Structural variant calling: the long and the short of it

ADD REPLYlink modified 18 days ago • written 18 days ago by Medhat8.9k

Thanks for the insights. Has anyone had experiences with Bionano's genome imaging platform? If so what did you think, did it perform better then Pac Bio in detecting genomic structural variants?

ADD REPLYlink written 16 days ago by dk031920

It is a cost-effective way to detect SVs, but comes with these limitations:

  • low accuracy breakpoint resolution
  • no sequence for identified insertions
  • You may miss identifying short SVs (it is more suited for very large SVs)
  • there is a shortage of opensource tools to analyze it
ADD REPLYlink modified 16 days ago • written 16 days ago by Medhat8.9k
gravatar for Dave Carlson
18 days ago by
Dave Carlson510
Stony Brook University, NY
Dave Carlson510 wrote:

Results for assembling a highly repetitive 1 Gb plant genome with ~100x coverage PE Illumina data: 300 Mb assembly (1/3 of the genome)

Results for assembling the same genome with ~100x Sequel 1 PacBio reads: 950 Mb genome (~90% of the genome)

These days, I wouldn't even attempt genome assembly with Illumina data alone.

ADD COMMENTlink written 18 days ago by Dave Carlson510
gravatar for h.mon
18 days ago by
h.mon32k wrote:

Currently, the best approach is having a mix of Illumina and PacBio (or Nanopore) sequencing. First step would be to assemble with long reads alone, or a hybrid assembly with long reads and short reads. There are very good assemblers for the long read data alone (e.g. Flye, which already performs polishing with the long read data), I don't have experience with hybrid assemblers. A second step would be one round of long read data polyshing, depending on the assembler, the improvements can be dramatic. After that, at least one round of short read polishing, to correct for the homopolymer systematic errors.

Long read data still suffers from high error rate (or, for PacBio CSS, not high, but systematic errors at homopolymers), thus assemblies with long reads alone may have a high rate of missing genes, due to frame-shifting assembly errors. As Dave Carlson already noted, the gains in contiguity and percentage of the genome recovered can be a lot higher for long read data compared to short reads, though I never observed something as dramatic as his report.

ADD COMMENTlink written 18 days ago by h.mon32k
Please log in to add an answer.


Use of this site constitutes acceptance of our User Agreement and Privacy Policy.
Powered by Biostar version 2.3.0
Traffic: 2499 users visited in the last hour