Question: How To Filter Mapped Reads With Samtools
32
gravatar for sohadb1357
4.7 years ago by
sohadb1357330
sohadb1357330 wrote:

Hi

How do I filter a bam file with some tools (Specifically -how I can remain with the unmapped reads only?). I have single-end mapping,

I searched for hours but everywhere I see the suggestion of samtools view -u -f 4 that (as I understood) doing the oposite thing - filtering out the unmapped reads.

Thanks!' Ohad

samtools • 89k views
ADD COMMENTlink modified 2.0 years ago by Paul620 • written 4.7 years ago by sohadb1357330
73
gravatar for Sukhdeep Singh
4.7 years ago by
Sukhdeep Singh8.9k
Netherlands
Sukhdeep Singh8.9k wrote:

Hi, You get a bam(machine readable sam) file after mapping, and it contains information about mapped and unmapped reads.

To get the unmapped reads from a bam file use :

samtools view -f 4 file.bam > unmapped.sam, the output will be in sam

to get the output in bam use : samtools view -b -f 4 file.bam > unmapped.bam

To get only the mapped reads use the parameter 'F', which works like -v of grep and skips the alignments for a specific flag.

samtools view -b -F 4 file.bam > mapped.bam

From the manual; there are different int codes you can use with the parameter 'f', based on what you want :

-f INT Only output alignments with all bits in INT present in the FLAG field. INT can be in hex in the format of /^0x[0-9A-F]+/ [0]

Each bit in the FLAG field is defined as:

Flag        Chr     Description
0x0001      p       the read is paired in sequencing
0x0002      P       the read is mapped in a proper pair
0x0004      u       the query sequence itself is unmapped
0x0008      U       the mate is unmapped
0x0010      r       strand of the query (1 for reverse)
0x0020      R       strand of the mate
0x0040      1       the read is the first read in a pair
0x0080      2       the read is the second read in a pair
0x0100      s       the alignment is not primary
0x0200      f       the read fails platform/vendor quality checks
0x0400      d       the read is either a PCR or an optical duplicate

Like for getting the unique reads (a single read mapping at one best position); I use :

-q INT Skip alignments with MAPQ smaller than INT [0]

samtools view -bq 1 file.bam > unique.bam

HTH

ADD COMMENTlink modified 4.7 years ago • written 4.7 years ago by Sukhdeep Singh8.9k
2
samtools view -b -F 4 file.bam > mapped.bam

Does it really get all mapped reads because using the above gives me less reads than:

samtools view -b -F 4 -f 8 file.bam > onlyThisEndMapped.bam
samtools view -b -F 8 -f 4 file.bam > onlyThatEndMapped.bam
samtools view -b -F12 file.bam > bothEndsMapped.bam
samtools merge merged.bam onlyThisEndMapped.bam onlyThatEndMapped.bam bothEndsMapped.bam
ADD REPLYlink written 2.5 years ago by 5heikki6.5k
3

From my understanding your first command extracts all mapped reads, but does not extract mates that were unmapped.  The command in your second set:  

samtools view -b -F 8 -f 4 file.bam > onlyThatEndMapped.bam

Extracts UNMAPPED reads who's mate is mapped.  Thus, you will have more reads with the second set of commands because you are including unmapped mates of mapped reads.

 

ADD REPLYlink modified 2.2 years ago • written 2.2 years ago by alolex860

To make sure I have this correct...

The -f options flags to keep the reads 'Only output alignments with all bits set in INT present in the FLAG field' The -F option flags to remove the reads 'Only output alignments with all bits set in INT present in the FLAG field'

using this page I explored the int meanings but I still a bit confused.

In your first command: samtools view -b -F 4 -f 8 file.bam > onlyThisEndMapped.bam

The reads that are unmapped are removed (-F 4) and the mates that are unmapped are kept (-f 8). Does this mean it extracts the mapped reads and mates that are unmapped (I assume mate means the other read for paired-end reads)?

My first post so go easy on me :)

ADD REPLYlink modified 13 months ago • written 13 months ago by 7ejoseph0

i agree with 5heikki's comment for filtering mapped PE reads if mapped means either both read and mate mapped or either mapped. From what I tested with pysam, this is their default definition of mapped in the fetch() function.

Edit: I changed my mind and realized that the command -f4 -F 8 doesn't really generate useful alignments ( alignment record about the read itself and read itself being unmapped while its mate being mapped) so it's best to just use -F 4 to get mapped alignments for PE from sam/bam files. If you want to have only read1 mapped, use flag -f 64 -F 4, for read2, use flag -f 128 -F 4.

ADD REPLYlink modified 9 months ago • written 10 months ago by epigene370

After mapping via BWA, you will get a sam file. So do you mean, we need to convert the sam file to bam file and then using samtools view -b -F 4 file.bam > mapped.bam ?

ADD REPLYlink written 2.2 years ago by Crystal30
1

You don't need to convert sam to bam file, its upto you as its just compressed version of sam, so saves some space.
Use this with sam,
samtools view -F 4 file.bam > mapped.bam

ADD REPLYlink modified 2.2 years ago • written 2.2 years ago by Sukhdeep Singh8.9k

hey dude,

i am really thankful for your useful answer which helped me a lot

ADD REPLYlink written 2.2 years ago by F2.6k

Hi,

Just a quick question, why does an alignment with a MAPQ score smaller than 1 mean a uniquely mapping read?

For example, I have the following read which is not unique/uniquely mapping (has the XS:i flag). But this has a MAPQ of 6 therefore would not be filtered out with this criteria? Any ideas?

HISEQ2500-09:128:H9FFTADXX:2:   163    scaffold_7359    909    6    100M    =    1189    380    TGATTATAAGGTTTTACATCACACATCTGAAAATTGATTGGGTACTTTCTTAAAAATTACATCATGATTATCATTTTTTTATTGTATGCAATTTTAATAA    @@@DDB?DDHH?DEBHGE??@FGHIIGGHEGDEECHEFGHII?BFBDFEHGHDCBAFGHIGIHIIGAEHCHIIIIIIIFEDCCCC>@DCACCCCCDCCAC    AS:i:-6    XS:i:-5    XN:i:0    XM:i:1    XO:i:0    XG:i:0    NM:i:1    MD:Z:71A28    YS:i:-16    YT:Z:CP

ADD REPLYlink written 20 months ago by AW280

sorry, can I use samtools view -b -F 4 file.bam > mapped.bam for accepted_hits.bam produced by cufflinks for paired-end reads????

ADD REPLYlink written 17 months ago by F2.6k
1

samtools view -b -f 2 file.bam > mappedPairs.bam

ADD REPLYlink modified 17 months ago • written 17 months ago by Sukhdeep Singh8.9k

thank you then by this command i can filter my accepted_hits.bam from tophat

samtools view -b -f 2 accepted_hits.bam > mappedPairs.bam

ADD REPLYlink written 17 months ago by F2.6k
2

yes, even better is 

samtools view -b -f 0x2 accepted_hits.bam > mappedPairs.bam

For more clarity, you should also read some posts.
Extracting from tophat outputs reads pairs and splice-junctions with a single best match
How To Extract Reads-Pairs Aligned Concordantly Exactly 1 Time?
tophat output files containing the reads which mapped uniquely as a pair

Cheers

ADD REPLYlink modified 17 months ago • written 17 months ago by Sukhdeep Singh8.9k

thank you so much

ADD REPLYlink written 17 months ago by F2.6k
5
gravatar for rgiannico
2.6 years ago by
rgiannico50
Italy
rgiannico50 wrote:

I had the same issue but with Paired End Reads, and I solved using samtools and bamToFastq. You can find bamToFastq here: https://code.google.com/p/hydra-sv/ 

  • If you need unmappedR1.fastq (containing both paired and unpaired R1 unmapped reads) and unmappedR2.fastq ( containing both paired and unpaired R2 unmapped reads). 

           Use samtools -f 4 to extract all unmapped reads :

samtools view -b -f 4 file.bam > file_unmapped.bam
bamToFastq -bam file_unmapped.bam -fq1 unmappedR1.fastq -fq2 unmappedR2.fastq

  • If you need unmappedpairedR1.fastq (containing only paired  R1 unmapped reads) and unmappedpairedR2.fastq ( containing only paired R2 unmapped reads). Meaning you need all paired reads where at least one of them is unmapped. 

           Use samtools -F 2 to discard only reads mapped in proper pair:

samtools view -b -F 2 file.bam > file_unmapped.bam 
bamToFastq -bam file_unmapped.bam -fq1 unmappedpairedR1.fastq -fq2 unmappedpairedR2.fastq

ADD COMMENTlink modified 2.6 years ago • written 2.6 years ago by rgiannico50

I tried, but I couldn't install the hydra package, because I can't sudo. Is there a way to install it without root access, as other programs allow? (my server don't allow root access outside of the university)

Thanx!

ADD REPLYlink written 2.6 years ago by marcelelaux20

Better to ask Aaron Quinlan, the developer. Try https://groups.google.com/forum/#!forum/bedtools-discuss

ADD REPLYlink written 2.5 years ago by Sukhdeep Singh8.9k

but with -f 4 only the mates are extracted that did not map, leaving out the part that did map ( -f 8). Or am I wrong? By this the information about pairing is lost, which is the valuable thing about paired-end datasets. There are cases, where one mate maps to one reference, but both do map to another.

ADD REPLYlink modified 7 months ago • written 7 months ago by michaela_boell60
5
gravatar for Paul
2.0 years ago by
Paul620
European Union
Paul620 wrote:

Very good is from Broad Institute their web app for check all bitwise flags:

https://broadinstitute.github.io/picard/explain-flags.html

 

ADD COMMENTlink written 2.0 years ago by Paul620
4
gravatar for Lee Katz
4.7 years ago by
Lee Katz2.8k
Atlanta, GA
Lee Katz2.8k wrote:

bam2fastq is really helpful with this kind of question. Really, really helpful.

ADD COMMENTlink written 4.7 years ago by Lee Katz2.8k
Please log in to add an answer.

Help
Access

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.
Powered by Biostar version 2.3.0
Traffic: 646 users visited in the last hour