I have NGS reads mapped to genome. Now I need to extract only a portion of the reads mapping to certain region, which I can do easily with:
samtools view my_bam.bam GENOME:1000-2000 > my_region.sam
However, the aligned reads are still recorded with their original indices on the genome (from
1000 to 2000 in this example). However, I need to have them indexed from 1, as if the requested region is new genome sequence.
1) is there any tool (or
sambamba setting) that can do this?
2) Sure, I can process the file manually and subtract the offset from the index. Is this the way to go, are there any gotchas regarding the
sam format e.g. offsets for reads mapping to reverse strand? (I know I will also need to replace the sequence id in the file.)
Ps. I wouldn't do this, but the tool I want to use require such input :(.
use awk to substract the POS and the mate-POS ?