Entering edit mode
8 months ago
Dear Experts, For some special application, I want to merge the read pairs of paired end reads from a BAM file, including the discordant reads, given that the genomic location is already known from the mapping to reference genome. I would mainly consider the following two conditions:
1- overlapping reads
1 7
=======>
||||
<=========
4 10
That would give me one read, like this.
1 10
============>
2- discordant reads.
Ref==============================================
5 10^^^^^^^^^
=====R1=====>|||||||||
|||||||||
<=====R2======
15 20
This one would result in:
================================>
5 20
So, in other words, I want to reconstruct the missing fragment in the discordant pairs.
Are there any tools that can do that? Best wishes
For case #1 you can use tools like
bbmerge.sh
, FLASH, PEAR etc before you align the data. If the reads overlap then they will merge. No need for alignment.bbmerge.sh
can trim off adapters etc.For #2 you can take a look at
samtools consensus
(LINK). I don't think it will generate a single read for every pair like you seem to want.I might be missing something but situation 2 is not a discordant pair? But seems like a proper/normal pair that do not overlap.
That said, I expect bedtools can help you, https://bedtools.readthedocs.io/en/latest/content/tools/bamtobed.html