1
0
Entering edit mode
10 weeks ago
popayekid55 ▴ 110

Hi All,

I have ULont reads mapped to draft assembly and would like to extract reads that span two contigs at ends/beginings. Read1 from the below example image.

Soft clipping information from the CIGAR can be used to identify such reads but is a tool or any easy way to extract such reads?

any help is much appreciated. thank you

spanning long reads • 460 views
0
Entering edit mode
samtools view -M -L contig1.bed -O BAM in.bam | samtools view -L contig2.bed


?

0
Entering edit mode

Thank you. It was an example image. I have multi-contig file (~19k) which is not feasible to check this way. I want to extract reads that span any 2 contigs, something like reads supporting translocations but in my case I want the reads that spans either beginning or ends

0
Entering edit mode

. It was an example image. I have multi-contig file (~19k) which is not feasible to check this way.

hum... why ?

0
Entering edit mode

If I'm understanding correctly, the "spanning" reads need not be between adjacent contigs, so (19000 choose 2) of the above commands would need to be executed. This, of course, could be offloaded to a script.

0
Entering edit mode
10 weeks ago
LChart 1.8k

As crazy as it sounds, I would recommend blastn for this. Convert the read .fastq into a .fasta and index it with makeblastdb. This will be your subject. Then for each contig split into >contig_left_100bp and >contig_right_100bp prefix and suffix sequences. This will be your query. Run blastn and find subjects with hits above your desired confidence thresholds to two different edges of contigs.

This will also identify circular sequences such as plasmid.

However, if you want to do more than identify such reads (i.e., to perform long-read scaffolding) you can use a tool such as LINKS or longstitch. These may have a way to extract the reads that support scaffolding.