How to extract reads in a specific region from a BAM file and also their mates mapped elsewhere ?
3
8
Entering edit mode
6.8 years ago

I have paired-end data as a BAM file and I am trying to extract reads from a specific region. This sounds pretty straightforward, however, the catch is that some of the reads may have their mates mapped in a completely different location, maybe due to large structural variation. 

I was going through some of samtools' options however, I did not find any particular parameter that can get me what I want. 

Any suggestions would be much appreciated. 

 

reads bam mate • 13k views
ADD COMMENT
4
Entering edit mode
6.8 years ago

I think it could be done using a two-step approach.

  1. First extract all the reads that are mapped within your region of interest using samtools view command.

    samtools view Input.bam chrA:x-y | cut -f1 > IDs.txt
    
  2. Now you can use these read ids and extract both forward and reverse reads corresponding to that read id from the original bam file. You can use a single grep command with multiple strings or read ids. See egrep and |. Of course , this is not an efficient way but it should work.

egrep "^#|Id1|Id2|....|Idn" Input.sam (sam format) > Output.sam or you can use grep -f IDs.txt Input.sam and add header afterwards.

ADD COMMENT
1
Entering edit mode

Pierre posted an efficient method for step 2 earlier:

Extract Alignment By Read Id From A Sam File

ADD REPLY
1
Entering edit mode
6.8 years ago
Rad ▴ 800
I wrote a detailed example on how to do that and you can run the code and go through the details in the code comments here : http://coderscrowd.com/app/codes/view/244 for your example I would do it on the mates output the results in bed files and use bedtools to remap the results Hope that helps
ADD COMMENT
0
Entering edit mode
2.2 years ago

I wrote something today: http://lindenb.github.io/jvarkit/SamViewWithMate.html

$ java -jar dist/samviewwithmate.jar -r "9:137230721-137230796"  ./src/test/resources/HG02260.transloc.chr9.14.bam | cut -f 1-9 | tail
ERR251239.10989793  83  9   137230747   60  30S70M  =   137230326   -490
ERR251239.3385449   147 9   137230754   60  1S99M   =   137230352   -500
ERR251240.17111373  99  9   137230764   60  100M    =   137231150   475
ERR251240.46859433  147 9   137230777   60  65S35M  =   137230342   -469
ERR251240.74563730  147 9   137230787   60  1S99M   =   137230407   -478
ERR251240.1291708   83  9   137230789   60  100M    =   137230411   -477
ERR251240.11887757  97  9   137230795   37  100M    14  79839451    0
ERR251239.34016218  81  14  79839349    37  100M    9   137230679   0
ERR251240.10196873  81  14  79839368    37  100M    9   137230721   0
ERR251240.11887757  145 14  79839451    37  100M    9   137230795   0
ADD COMMENT

Login before adding your answer.

Traffic: 2388 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6