samtools view with specified regions, different output with and without -F 4
2
0
Entering edit mode
7.5 years ago
tonja.r ▴ 600

I want to count the number of reads falling into specified regions. Why is there a difference when I used -F 4 -L and just -L.

I used ​

samtools view -c  -L ../../../gene_matrix/matrix_gene.bed ENCFF001KYF.bam
31168853
samtools view -c -F 4 -L ../../../gene_matrix/matrix_gene.bed ENCFF001KYF.bam
14868564
samtools view -c -F 4 ENCFF001KYF.bam
23240270


So, the number of mapped reads is 23240270. And when I search for the number of reads in a specified regions not taking into account -F option, I am getting more reads (31168853) then I have the mapped reads. How is it possible? Why do I get some strange reads that fall into my region which flag is not set to 4? I though if the read is in the region then it will be for sure mapped.

alignment • 2.4k views
1
Entering edit mode
7.5 years ago

If bit 4 of the flag is set, the chromosome and position information can be ignored. Many aligners will give a position of * and chromosome of * for unmapped alignments, but others won't. It is rather annoying when aligners give unmapped alignments position information though, since it screws up people's pipelines.

FYI, picard will complain about stuff like this (i.e., stuff that follows the spec. but doesn't follow what really should ideally be done).

0
Entering edit mode
7.5 years ago
matted 7.7k

I'm not sure for your exact case, but some aligners break the "rules" (or at least your reasonable interpretation of them) and give mapped locations for reads that are flagged as unmapped. This could be causing what you're seeing.

See this FAQ from Picard about bwa, for instance: "BWA can produce SAM records that are marked as unmapped but have non-zero MAPQ and/or non-"*" CIGAR. Typically this is because BWA found an alignment for the read that hangs off the end of the reference sequence."

You can examine some of your reads manually to see if this is the case, or look at the output from the complementary -f 4 query.