Extract Reads from a Sam file outside a given region
1
1
Entering edit mode
8.1 years ago
ARB ▴ 120

Hi all,

I know this may sound simple, but I am really having trouble finding a way how to extract reads that fall outside a give region. I tried the -U option but it doesnt work (or maybe I am not using it correctly). I didn't find any example either which uses -U option of samtools view.

I am using this command: $samtools view sample.bam 2:33050509-33154206 -U without-region.sam

This is paired end data and I want to retain the other paired read that falls outside the region, therefore grepping reads other than the one in my specified region, wont work either.

Thanks in advance.

next-gen alignment • 3.6k views
ADD COMMENT
3
Entering edit mode

If you want to keep pairs then you'll need to write a script to do this. I'd recommend doing it in python with pysam.

ADD REPLY
0
Entering edit mode

Thanks Devon, I think that's the only way to go.

ADD REPLY
0
Entering edit mode

Out of curiosity, how much do you really lose if you follow your current approach? How strict are you, i.e. if single base falls in the region would you remove the read?

Please have a look at How to extract reads in a specific region from a BAM file and also their mates mapped elsewhere ? and at Extract reads within given region, and their mates

In concept, they are asking the opposite, but have the same concern... maybe it helps!

ADD REPLY
8
Entering edit mode
8.0 years ago
ARB ▴ 120

Hi all,

I found the answer to this issue, its a very simple fix. Sorry for replying so late. Got busy with some other stuff and totally forgot to reply. Anyways, if one wants to select regions outside of a given region, you can use -U option, but as I said earlier it doesnt work with the range specified on command line. But if you give a bedfile containing the region ad then provide -U option, it will work.

For example: Create bedfile (my.bed): 2 33050509 33154206 LINC00486 and now use this bedfile to extract region other than the ones in the given bedfile by: $samtools view sample.bam -L my.bed -U without-bed > /dev/null

To my knowledge with this approach, a read that has even a single base in the specified region gets counted as within region, the read just need to span over either of the start or stop of the region (even if its a single base).

Let me know if you have any questions.

Thanks.

ADD COMMENT
1
Entering edit mode
ADD REPLY

Login before adding your answer.

Traffic: 2111 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6