Question: samtools view with specified regions, different output with and without -F 4
gravatar for tonja.r
4.4 years ago by
tonja.r460 wrote:


I want to count the number of reads falling into specified regions. Why is there a difference when I used -F 4 -L and just -L.

I used

samtools view -c  -L ../../../gene_matrix/matrix_gene.bed ENCFF001KYF.bam
samtools view -c -F 4 -L ../../../gene_matrix/matrix_gene.bed ENCFF001KYF.bam
samtools view -c -F 4 ENCFF001KYF.bam



So, the number of mapped reads is 23240270. And when I search for the number of reads in a specified regions not taking into account -F option, I am getting more reads (31168853) then I have the mapped reads. How is it possible? Why do I get some strange reads that fall into my region which flag is not set to 4? I though if the read is in the region then it will be for sure mapped.


alignment • 1.4k views
ADD COMMENTlink modified 4.4 years ago by Devon Ryan94k • written 4.4 years ago by tonja.r460
gravatar for Devon Ryan
4.4 years ago by
Devon Ryan94k
Freiburg, Germany
Devon Ryan94k wrote:

If bit 4 of the flag is set, the chromosome and position information can be ignored. Many aligners will give a position of * and chromosome of * for unmapped alignments, but others won't. It is rather annoying when aligners give unmapped alignments position information though, since it screws up people's pipelines.

FYI, picard will complain about stuff like this (i.e., stuff that follows the spec. but doesn't follow what really should ideally be done).

ADD COMMENTlink written 4.4 years ago by Devon Ryan94k
gravatar for matted
4.4 years ago by
Boston, United States
matted7.2k wrote:

I'm not sure for your exact case, but some aligners break the "rules" (or at least your reasonable interpretation of them) and give mapped locations for reads that are flagged as unmapped. This could be causing what you're seeing.

See this FAQ from Picard about bwa, for instance: "BWA can produce SAM records that are marked as unmapped but have non-zero MAPQ and/or non-"*" CIGAR. Typically this is because BWA found an alignment for the read that hangs off the end of the reference sequence."

You can examine some of your reads manually to see if this is the case, or look at the output from the complementary -f 4 query.

ADD COMMENTlink modified 3 months ago by RamRS25k • written 4.4 years ago by matted7.2k
Please log in to add an answer.


Use of this site constitutes acceptance of our User Agreement and Privacy Policy.
Powered by Biostar version 2.3.0
Traffic: 799 users visited in the last hour