Question: Extract reads within given region, and their mates
3
gravatar for user230613
4.2 years ago by
user230613280
Europe
user230613280 wrote:

Hi there, I want to extract all the reads located inside a given location. I would like to extract also the mates of those reads, because I'm working with PE data. I know that I can extract the reads using samtools and the region of interest, but how can I take off the mates too? This is my command:

samtools view -f 2 file.bam chr:start-end

I'm using flag 2 to consider only properly paired reads. I want to evaluate the strand the paired end reads to know if both are forward, reverse..

What I've done is to extract the mates falling inside the region, take the names of those reads, and grep those names in the original bam file to extract both reads. But the bam file is too big and it's terribly slow.

Thanks in advance,

samtools bam • 8.2k views
ADD COMMENTlink modified 7 months ago by cmdcolin1.2k • written 4.2 years ago by user230613280
5

I still find it deeply worrying how often we see people not using -F 4. There must be one event a week on Biostars alone where people dont use -F 4.

You NEED -F 4 pretty much every time you want to filter on mapped reads, otherwise you will get reads that are in a proper pair, mapped to a chromosome with a sensible fragment length but are still NOT MAPPED!!
And it's not trivial; properly-paired-but-unmapped reads can have TLENs of a billion basepairs, which will totally screw your averages for insert length, etc, which are often used to normalise things - but those sort of bugs go totally and utterly unnoticed.

ADD REPLYlink written 4.2 years ago by John12k
2

For those who want to explore this very important point more.  This utility is very helpful for thinking about SAM flags and their meaning. https://broadinstitute.github.io/picard/explain-flags.html.

Remember `-f` means give me (require) everything that matches the criteria of my flag (e.g. -f 2, get 'read mapped in proper pair') and `-F` means remove everything (filter) that matches the criteria of my flag (e.g. -F 4, remove 'read unmapped').

ADD REPLYlink written 4.2 years ago by Malachi Griffith17k
1

Yes!  I use this tool all the time.  It is so helpful!  Even if I think I have my flags right I alway double check with this tool.

ADD REPLYlink written 4.2 years ago by alolex900

How is this possible?

[reads] mapped to a chromosome with a sensible fragment length [which] are NOT MAPPED ?

I have taken a look at one of my bam files, extracted a region and sought for unmapped reads in the resulting sam output. I actually found a single read that is registered as belonging to the extracted region but it is also flagged as unmapped.

samtools view -F4 sample.bam chr4:0-10000 | samtools view -f 4

ADD REPLYlink modified 6 months ago • written 6 months ago by douwevanderleest10

Are you grep-ing those names one at a time or are you using "grep -Ff" 

ADD REPLYlink written 4.2 years ago by Ashutosh Pandey11k

Second one, grep -Ff.

ADD REPLYlink modified 4.2 years ago • written 4.2 years ago by user230613280

I wrote something today: http://lindenb.github.io/jvarkit/SamViewWithMate.html

ADD REPLYlink written 7 months ago by Pierre Lindenbaum122k
1
gravatar for cmdcolin
7 months ago by
cmdcolin1.2k
United States
cmdcolin1.2k wrote:

The tool "Bazam" is built for this purpose but it outputs the reads in fastq https://github.com/ssadedin/bazam

Ref https://www.biorxiv.org/content/10.1101/433003v2

ADD COMMENTlink written 7 months ago by cmdcolin1.2k
Please log in to add an answer.

Help
Access

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.
Powered by Biostar version 2.3.0
Traffic: 1200 users visited in the last hour