Question: Polish PacBio assembly with Hi-C reads
10 months ago
wrote:


I have a small haploid genome (85 Mb) that was assembled with Canu based on ~100x of PacBio Sequel reads. In addition, a batch of 40 Gbp Hi-C Illumina reads was sequenced to perform scaffolding. The assembly has been polished with Arrow, but there is not a third dataset of Illumina reads to polish with Pilon. I was wondering if I could instead use the Hi-C reads to perform the Illumina polishing step by mapping one or both ends of the reads individually to the assembly. However, given the nature of Hi-C reads, I am a little concerned that the uneven coverage and chimeric reads could have a negative impact. Anyone has previous experience with this approach? Is it a good idea to use Hi-C reads to polish an assembly?


10 months ago

The uneven coverage means polishing will be uneven, with some regions unpolished. As for the chimeric reads, you could use only reads mapping end-to-end to the reference, e.g., using samclip.

10 months ago

Thanks h.mon for the suggestion. Like you pointed out, using only end-to-end mapped reads could still be useful to polish regions of the genome. I will give it a shot and see how it looks.

10 months ago

how did it go, I was thinking the same?

7 months ago

I gave it a shot, but did not move forward with it. Based on the info I gathered, it can be done but there is no guarantee of the results. At the end, we decided to sequence more Illumina data for the polishing step to avoid downstream problems. But I can still describe what I did:

To polish the assembly with Hi-C reads, I mapped both ends individually with bwa mem. After removing unmapped reads, supplementary and secondary alignments with samtools, I removed PCR-duplicated reads with Picardtools. Clipped reads were also removed with samclip, since they are likely chimeric reads.

Using the dataset described above, Pilon confirmed 99% of the bases in the assembly (previously polished with Arrow), and performed 726 changes, of which 88% were correction of single-base INDELs. To me, these numbers suggest that the polishing was successful. Again, we did not move forward with it to avoid downstream problems since this is not a common approach and I have no seen in depth analyses of possible complications.

7 months ago
