Question: difference between mapq=0 and samtools flag 0x900
0
gravatar for bitpir
14 months ago by
bitpir140
bitpir140 wrote:

Hello,

I am comparing the # of uniquely mapped reads from the same fastq (illumina paired end) to 2 reference sequences using BWA-MEM. I expect mapping to one of the reference sequences (let's call it ref A) to be have many multi-mappers and mapping the other (ref B) to be more unique. Ref A has many duplicate sequences.

I'm using the same bwa-mem parameter for alignment:

bwa mem -M -t 16 -B 1 -T 40

I calculated uniquely mapped reads using samtools view:

unique_alignment=$(samtools view -c -F 0x904) # for excluding secondary, supplementary, unmapped reads
mapq30_unique=$(samtools view -c -q 10 -F 0x904)

The results are below:

    # mapped reads  # uniquely mapped reads % of total seq  # of unmapped   # of MAPQ=10    % MAPQ=10
refA    15758104    15008036    95.24011264 0   10384098    65.7639396211625
refB    16352354    15632108    95.595459834101 0   14264615    87.9141156068417

As you can see, there isn't any difference in % of uniquely mapped reads (~95% for both) but a difference is seen in MAPQ=10.

From what I read, MAPQ=0 means reads are mapped to many places. I was wondering why using samtools view alone fail to capture uniquely mapped reads, and additional MAPQ filtering is necessary to get the unique mapped reads?

Thanks for the help!

mapq samtools • 675 views
ADD COMMENTlink modified 14 months ago by Devon Ryan91k • written 14 months ago by bitpir140
1
gravatar for Devon Ryan
14 months ago by
Devon Ryan91k
Freiburg, Germany
Devon Ryan91k wrote:

All mapped reads have a primary alignment, which is what you end up counting with -F 0x904. You're not just excluding "multimappers" (that's not a terribly useful term) with the addition of -q 10, but also more generically uncertain alignments.

If you really want to see how well your dataset aligns to the two genomes then you need to instead concatenate the genomes together and align against the resulting merged genome. Then split the results by genome.

ADD COMMENTlink written 14 months ago by Devon Ryan91k
Please log in to add an answer.

Help
Access

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.
Powered by Biostar version 2.3.0
Traffic: 1914 users visited in the last hour