Find breakpoints using long reads
8 weeks ago

Hello everyone! I want to determine the precise positions of breakpoints in sp1 (assembled species). I have a number of long nanopore .fastq reads from sp2 (unassembled species). The species sp1 and sp2 are closely related. I am aware of the breakpoints' approximative coordinates (coord2-coord1 ≈ 1Mb, coord4-coord3 ≈ 1Mb). (View the image.)

I adopted the following strategy: I cut left and right regions and aligned to these .fasta files long nanopore reads separately. I thought that there should only have been a few long reads that both alignments shared. And how I believed that there are breakpoints in these reads. But I discovered that these files have about 40k common reads.

Maybe someone has a better idea (tools) or could improve mine! I appreciate it.

8 weeks ago
shelkmike ▴ 830

If I understand correctly, you want to do what is called "structural variant calling". You can align reads of sp2 to the genome of sp1 and then use Sniffles ( Sniffles will give you a list of structural differences between these two genomes.

An even more simple strategy is to align reads and then visualise the alignment in a program like Tablet ( You will observe the breakpoints by eye.


