Question: difference between mapq=0 and samtools flag 0x900
0
gravatar for bitpir
9 months ago by
bitpir130
bitpir130 wrote:

Hello,

I am comparing the # of uniquely mapped reads from the same fastq (illumina paired end) to 2 reference sequences using BWA-MEM. I expect mapping to one of the reference sequences (let's call it ref A) to be have many multi-mappers and mapping the other (ref B) to be more unique. Ref A has many duplicate sequences.

I'm using the same bwa-mem parameter for alignment:

bwa mem -M -t 16 -B 1 -T 40

I calculated uniquely mapped reads using samtools view:

unique_alignment=$(samtools view -c -F 0x904) # for excluding secondary, supplementary, unmapped reads
mapq30_unique=$(samtools view -c -q 10 -F 0x904)

The results are below:

    # mapped reads  # uniquely mapped reads % of total seq  # of unmapped   # of MAPQ=10    % MAPQ=10
refA    15758104    15008036    95.24011264 0   10384098    65.7639396211625
refB    16352354    15632108    95.595459834101 0   14264615    87.9141156068417

As you can see, there isn't any difference in % of uniquely mapped reads (~95% for both) but a difference is seen in MAPQ=10.

From what I read, MAPQ=0 means reads are mapped to many places. I was wondering why using samtools view alone fail to capture uniquely mapped reads, and additional MAPQ filtering is necessary to get the unique mapped reads?

Thanks for the help!

mapq samtools • 450 views
ADD COMMENTlink modified 9 months ago by Devon Ryan89k • written 9 months ago by bitpir130
1
gravatar for Devon Ryan
9 months ago by
Devon Ryan89k
Freiburg, Germany
Devon Ryan89k wrote:

All mapped reads have a primary alignment, which is what you end up counting with -F 0x904. You're not just excluding "multimappers" (that's not a terribly useful term) with the addition of -q 10, but also more generically uncertain alignments.

If you really want to see how well your dataset aligns to the two genomes then you need to instead concatenate the genomes together and align against the resulting merged genome. Then split the results by genome.

ADD COMMENTlink written 9 months ago by Devon Ryan89k
Please log in to add an answer.

Help
Access

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.
Powered by Biostar version 2.3.0
Traffic: 851 users visited in the last hour