Question: Polish PacBio assembly with Hi-C reads
gravatar for alex.zaccaron
10 months ago by
alex.zaccaron170 wrote:


I have a small haploid genome (85 Mb) that was assembled with Canu based on ~100x of PacBio Sequel reads. In addition, a batch of 40 Gbp Hi-C Illumina reads was sequenced to perform scaffolding. The assembly has been polished with Arrow, but there is not a third dataset of Illumina reads to polish with Pilon. I was wondering if I could instead use the Hi-C reads to perform the Illumina polishing step by mapping one or both ends of the reads individually to the assembly. However, given the nature of Hi-C reads, I am a little concerned that the uneven coverage and chimeric reads could have a negative impact. Anyone has previous experience with this approach? Is it a good idea to use Hi-C reads to polish an assembly?


sequencing assembly • 450 views
ADD COMMENTlink written 10 months ago by alex.zaccaron170

The uneven coverage means polishing will be uneven, with some regions unpolished. As for the chimeric reads, you could use only reads mapping end-to-end to the reference, e.g., using samclip.

ADD REPLYlink written 10 months ago by h.mon30k

Thanks h.mon for the suggestion. Like you pointed out, using only end-to-end mapped reads could still be useful to polish regions of the genome. I will give it a shot and see how it looks.

ADD REPLYlink written 10 months ago by alex.zaccaron170

how did it go, I was thinking the same?

ADD REPLYlink written 7 months ago by rob234king600

I gave it a shot, but did not move forward with it. Based on the info I gathered, it can be done but there is no guarantee of the results. At the end, we decided to sequence more Illumina data for the polishing step to avoid downstream problems. But I can still describe what I did:

To polish the assembly with Hi-C reads, I mapped both ends individually with bwa mem. After removing unmapped reads, supplementary and secondary alignments with samtools, I removed PCR-duplicated reads with Picardtools. Clipped reads were also removed with samclip, since they are likely chimeric reads.

Using the dataset described above, Pilon confirmed 99% of the bases in the assembly (previously polished with Arrow), and performed 726 changes, of which 88% were correction of single-base INDELs. To me, these numbers suggest that the polishing was successful. Again, we did not move forward with it to avoid downstream problems since this is not a common approach and I have no seen in depth analyses of possible complications.

ADD REPLYlink modified 7 months ago • written 7 months ago by alex.zaccaron170
Please log in to add an answer.


Use of this site constitutes acceptance of our User Agreement and Privacy Policy.
Powered by Biostar version 2.3.0
Traffic: 779 users visited in the last hour