Assembling a derivative chromosome from ONT WGS data
0
2
Entering edit mode
4 months ago
Noah ▴ 30

I am working on a project surrounding using ONT reads to do genome assembly. Specifically, I want to be able to not assemble the entire genome, but just a ~1-2Mb region around a chromosomal breakpoint that I previously identified using Hi-C. This will give us an idea about the (epi)genetic landscape around the breakpoint, and from there we can do further analysis on the assembled region.

Currently, the approach I've come up with is:

  1. Extract all split reads (reads that map to the two chromosomes of interest) that span the breakpoint from the bam file
  2. Assemble this small region with flye, to obtain a sequence that spans the breakpoint
  3. Extract all reads, not just split, within a predefined window around the breakpoint
  4. Assemble this large region with flye, and identify which of the assembled contigs correspond to the one spanning / near the breakpoint by re-aligning the split reads to each of these large-region contigs
  5. Arrange these contigs nearest to the breakpoint with RagTag, using the small breakpoint-spanning assembly I obtained from step 2
  6. Align epigenetic (ChIP) data to evaluate 'correctness' or sequence

The main problem I faced is that, when considering a larger window size and therefore more reads, the contigs I obtain are less likely to span the breakpoint (which I guess is to be expected), which is why I used the scaffolding approach. The final scaffold aligns to hg38 in such a way that it suggests it does span the breakpoint, but I wanted to get any feedback on this process I sort of hacked together as I went along.

ONT assembly Whole-Genome • 420 views
ADD COMMENT

Login before adding your answer.

Traffic: 3830 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6