Question: How to extract reads in a specific region from a BAM file and also their mates mapped elsewhere ?
gravatar for thecuriousbiologist
6.6 years ago by
United States
thecuriousbiologist490 wrote:

I have paired-end data as a BAM file and I am trying to extract reads from a specific region. This sounds pretty straightforward, however, the catch is that some of the reads may have their mates mapped in a completely different location, maybe due to large structural variation. 

I was going through some of samtools' options however, I did not find any particular parameter that can get me what I want. 

Any suggestions would be much appreciated. 


mate bam reads • 13k views
ADD COMMENTlink modified 2.0 years ago by Pierre Lindenbaum134k • written 6.6 years ago by thecuriousbiologist490
gravatar for Ashutosh Pandey
6.6 years ago by
Ashutosh Pandey12k wrote:

I think it could be done using a two-step approach.

  1. First extract all the reads that are mapped within your region of interest using samtools view command.

    samtools view Input.bam chrA:x-y | cut -f1 > IDs.txt
  2. Now you can use these read ids and extract both forward and reverse reads corresponding to that read id from the original bam file. You can use a single grep command with multiple strings or read ids. See egrep and |. Of course , this is not an efficient way but it should work.

egrep "^#|Id1|Id2|....|Idn" Input.sam (sam format) > Output.sam or you can use grep -f IDs.txt Input.sam and add header afterwards.

ADD COMMENTlink modified 16 months ago by Ram32k • written 6.6 years ago by Ashutosh Pandey12k

Pierre posted an efficient method for step 2 earlier:

Extract Alignment By Read Id From A Sam File

ADD REPLYlink written 6.6 years ago by donfreed1.5k
gravatar for Rad
6.6 years ago by
Rad800 wrote:
I wrote a detailed example on how to do that and you can run the code and go through the details in the code comments here : for your example I would do it on the mates output the results in bed files and use bedtools to remap the results Hope that helps
ADD COMMENTlink written 6.6 years ago by Rad800
gravatar for Pierre Lindenbaum
2.0 years ago by
France/Nantes/Institut du Thorax - INSERM UMR1087
Pierre Lindenbaum134k wrote:

I wrote something today:

$ java -jar dist/samviewwithmate.jar -r "9:137230721-137230796"  ./src/test/resources/HG02260.transloc.chr9.14.bam | cut -f 1-9 | tail
ERR251239.10989793  83  9   137230747   60  30S70M  =   137230326   -490
ERR251239.3385449   147 9   137230754   60  1S99M   =   137230352   -500
ERR251240.17111373  99  9   137230764   60  100M    =   137231150   475
ERR251240.46859433  147 9   137230777   60  65S35M  =   137230342   -469
ERR251240.74563730  147 9   137230787   60  1S99M   =   137230407   -478
ERR251240.1291708   83  9   137230789   60  100M    =   137230411   -477
ERR251240.11887757  97  9   137230795   37  100M    14  79839451    0
ERR251239.34016218  81  14  79839349    37  100M    9   137230679   0
ERR251240.10196873  81  14  79839368    37  100M    9   137230721   0
ERR251240.11887757  145 14  79839451    37  100M    9   137230795   0
ADD COMMENTlink written 2.0 years ago by Pierre Lindenbaum134k
Please log in to add an answer.


Use of this site constitutes acceptance of our User Agreement and Privacy Policy.
Powered by Biostar version 2.3.0
Traffic: 1654 users visited in the last hour