Question

proovread Illumina coverage for hybrid genome assembly

0

Entering edit mode

7.8 years ago

Josué Barrera ▴ 10

Hello everyone!

I'm planning to use proovread to correct some PacBio sequences and use them to assemble a plant genome (around 400 Mbp). I currently have 30x coverage of PacBio data, 88x coverage of HiSeq2000 data and 24x coverage of MiSeq data (after quality filters and paired-end merging of Illumina sequences). The proovread manual suggests a coverage around 30-50x.

Is there any reason, aside from computational time, to use a short read coverage =<50x ? Will a higher coverage (112x) improve the results obtained from proovread?

Or is there any other hybrid method you suggest I could use to benefit from both my Illumina and PacBio data (e.g., DBG2OLC, ABySS)?

Thanks!

genome Assembly hybrid PacBio Illumina • 2.7k views

ADD COMMENT • link updated 7.8 years ago by Medhat 9.7k • written 7.8 years ago by Josué Barrera ▴ 10

score 1 · Answer 1 · 2016-08-03

1

Entering edit mode

7.8 years ago

Medhat 9.7k

more coverage is always better, in case of proovread the author suggest that you correct your pacbio read in chunks not all at one cause of the memory

Don’t run proovread on entire SMRT cells directly, it will only blast your memory and take forever. Split your data in handy chunks of a few Mbp first:

and he gives you this suggestion

# located in /path/to/proovread/bin
SeqChunker -s 20M -o pb-%03d.fq pb-subreads.fq

proovread -l pb-001.fq -s reads.fq [-u unitigs.fa] --pre pb-001

on the other hand if you have a pacbio coverage more than 20X you can try canu

If you care about speed "in correction" you can use LoRDEC

regarding assembly If you want to use hybrid assembly you can use PBcR "again the author of this software suggest you use canu" also toy can use DBG2OLC It is relatively faster

ADD COMMENT • link 7.8 years ago by Medhat 9.7k

0

Entering edit mode

Thank you very much for your reply!

I think I'll try both proovread + canu and DBG2OLC to see which gives me the best results.

ADD REPLY • link 7.7 years ago by Josué Barrera ▴ 10

1

Entering edit mode

canu takes uncorrected pacbio reads , so no need to use proovread with canu.

ADD REPLY • link 7.7 years ago by Medhat 9.7k

0

Entering edit mode

The CANU documentation (release 1.3) still recommends polishing for 'best accuracy' (sic).

ADD REPLY • link 7.6 years ago by jahn.davik • 0