Question: samtools view with specified regions, different output with and without -F 4
0
gravatar for tonja.r
3.5 years ago by
tonja.r450
UK
tonja.r450 wrote:

 

I want to count the number of reads falling into specified regions. Why is there a difference when I used -F 4 -L and just -L.

I used

samtools view -c  -L ../../../gene_matrix/matrix_gene.bed ENCFF001KYF.bam
31168853
samtools view -c -F 4 -L ../../../gene_matrix/matrix_gene.bed ENCFF001KYF.bam
14868564
samtools view -c -F 4 ENCFF001KYF.bam
23240270

 

 

So, the number of mapped reads is 23240270. And when I search for the number of reads in a specified regions not taking into account -F option, I am getting more reads (31168853) then I have the mapped reads. How is it possible? Why do I get some strange reads that fall into my region which flag is not set to 4? I though if the read is in the region then it will be for sure mapped.

 

alignment • 1.1k views
ADD COMMENTlink modified 3.5 years ago by Devon Ryan89k • written 3.5 years ago by tonja.r450
1
gravatar for Devon Ryan
3.5 years ago by
Devon Ryan89k
Freiburg, Germany
Devon Ryan89k wrote:

If bit 4 of the flag is set, the chromosome and position information can be ignored. Many aligners will give a position of * and chromosome of * for unmapped alignments, but others won't. It is rather annoying when aligners give unmapped alignments position information though, since it screws up people's pipelines.

FYI, picard will complain about stuff like this (i.e., stuff that follows the spec. but doesn't follow what really should ideally be done).

ADD COMMENTlink written 3.5 years ago by Devon Ryan89k
0
gravatar for matted
3.5 years ago by
matted7.0k
Boston, United States
matted7.0k wrote:

I'm not sure for your exact case, but some aligners break the "rules" (or at least your reasonable interpretation of them) and give mapped locations for reads that are flagged as unmapped.  This could be causing what you're seeing.

See this FAQ from Picard about bwa, for instance: "BWA can produce SAM records that are marked as unmapped but have non-zero MAPQ and/or non-"*" CIGAR. Typically this is because BWA found an alignment for the read that hangs off the end of the reference sequence."

You can examine some of your reads manually to see if this is the case, or look at the output from the complementary -f 4 query.

ADD COMMENTlink modified 3.5 years ago • written 3.5 years ago by matted7.0k
Please log in to add an answer.

Help
Access

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.
Powered by Biostar version 2.3.0
Traffic: 1155 users visited in the last hour