Extract reads within given region, and their mates
1
3
Entering edit mode
5.8 years ago
user230613 ▴ 290

Hi there, I want to extract all the reads located inside a given location. I would like to extract also the mates of those reads, because I'm working with PE data. I know that I can extract the reads using samtools and the region of interest, but how can I take off the mates too? This is my command:

samtools view -f 2 file.bam chr:start-end


I'm using flag 2 to consider only properly paired reads. I want to evaluate the strand the paired end reads to know if both are forward, reverse..

What I've done is to extract the mates falling inside the region, take the names of those reads, and grep those names in the original bam file to extract both reads. But the bam file is too big and it's terribly slow.

bam samtools • 11k views
5
Entering edit mode

I still find it deeply worrying how often we see people not using -F 4. There must be one event a week on Biostars alone where people dont use -F 4.

You NEED -F 4 pretty much every time you want to filter on mapped reads, otherwise you will get reads that are in a proper pair, mapped to a chromosome with a sensible fragment length but are still NOT MAPPED!!
And it's not trivial; properly-paired-but-unmapped reads can have TLENs of a billion basepairs, which will totally screw your averages for insert length, etc, which are often used to normalise things - but those sort of bugs go totally and utterly unnoticed.

2
Entering edit mode

For those who want to explore this very important point more.  This utility is very helpful for thinking about SAM flags and their meaning. https://broadinstitute.github.io/picard/explain-flags.html.

Remember -f means give me (require) everything that matches the criteria of my flag (e.g. -f 2, get 'read mapped in proper pair') and -F means remove everything (filter) that matches the criteria of my flag (e.g. -F 4, remove 'read unmapped').

1
Entering edit mode

Yes!  I use this tool all the time.  It is so helpful!  Even if I think I have my flags right I alway double check with this tool.

0
Entering edit mode

How is this possible?

[reads] mapped to a chromosome with a sensible fragment length [which] are NOT MAPPED ?

I have taken a look at one of my bam files, extracted a region and sought for unmapped reads in the resulting sam output. I actually found a single read that is registered as belonging to the extracted region but it is also flagged as unmapped.

samtools view -F4 sample.bam chr4:0-10000 | samtools view -f 4

4
Entering edit mode
1
Entering edit mode
0
Entering edit mode

Are you grep-ing those names one at a time or are you using "grep -Ff"

0
Entering edit mode

Second one, grep -Ff.

1
Entering edit mode
2.2 years ago
cmdcolin ★ 1.5k

The tool "Bazam" is built for this purpose but it outputs the reads in fastq https://github.com/ssadedin/bazam