Which threshold to be put in the MAPQ when doing mpileup of BCF tools on a STAR-generated BAM file ?
1
1
Entering edit mode
2.8 years ago
mohsamir2016 ▴ 30

Dear All Sorry for bothering. I am doing variant calling using BCFtools mpileup then call and I would like to add a filteration step in the command for mpileup in BCF tool, but really confused about which threshold to be used for a min mapping quality. In my BAM file produced from STAR, most reads have 255 value in the MAPQ field. Now I am confused which number is correct for counting the n of unique mapped reads in STAR alignment BAM file ? I have done this by two ways, which ends up in different results:

  1. The number/% of unique mapped reads that one can get from the log.files output from STAR aligner
  2. The number that I obtain when running (Most of my BAM file have the 255 value in the MAPQ filed (column 5)

    samtools view -q 255 -c filteredfile.bam

So which one of these is the more accurate description of the unique mapped reads ? so that I can use 255 in the filter
so shall I set this to 255 ...

i am using the below command for mpileup in BCF tool

bcftools mpileup -Ou -o Variant/L10A2.bcf -f Ref/GCF_000002315.5_GRCg6a_genomic.fna L10A2_sortedbycordinate_fix_marked.bam -Q 20

Thanks

RNAseq STAR • 3.2k views
ADD COMMENT
0
Entering edit mode

(Most of my BAM file have the 255 value in the MAPQ filed

MAPQ=255 => the mapping quality is meaningless.

see https://samtools.github.io/hts-specs/SAMv1.pdf

MAPQ: MAPping Quality. It equals −10 log10 Pr{mapping position is wrong}, rounded to the nearest integer. A value 255 indicates that the mapping quality is not available.

ADD REPLY
0
Entering edit mode

Thanks Pierre, This is understandable,. This is meaningless because these are uniquely mapped reads so there is no chance that there MAPQ equal a number. Could you comment on the question: which number to be put in the calling for variant using the BCF tools Could you comment ATpoint

Thanks

ADD REPLY
0
Entering edit mode

This post is very explanatory. As a "raw" filter, in the past I have seen many people using "30". However, you can be more strict if you want, depending on your goal. For variant calling, I personally use a combination of filters including depth, number of reads supporting the alternate allele.. etc.

ADD REPLY
0
Entering edit mode

Thanks Iran, however I do not think 30 is the best because as I mentioned above when running STAR aligner, and output file that only contains mapped reads, around 80% of the reads will have a MAPQ of 255, which as Pierre said is meaningless. So If I put the threshold for 30, it make no sense there will be no selection ? I went into that post you made, and it tells very clear;y that 255 is the MAPQ of the uniquely mapped reads.

By the way, do you know if one can filter the reads in the VCF file for those that are called from reads with certain MAPQ ? I mean filtering the VCF later for the MAPQ rather than doing it during the mpileup step ?

Best

ADD REPLY
0
Entering edit mode

Well, the point is not to discard as many reads as possible, but to keep the ones that are more likely to be correctly mapped. Therefore, the more reads with 255, the better.

Usually in many VCF you can find an output annotation for each variant with the mean mapping quality of the reads in the given position.

ADD REPLY
0
Entering edit mode
7 weeks ago
Titus ▴ 910

Context

To clarify the MAPQ the 255 value doesn't have the same meaning in samtools and STAR :

In STAR manual :

The mapping quality MAPQ (column 5) is 255 for uniquely mapping reads, and int(-10*log10(1- 1/Nmap)) for multi-mapping reads.

In samtools manual :

MAPQ: MAPping Quality. It equals -10 log10 Pr{mapping position is wrong}, rounded to the nearest integer. A value 255 indicates that the mapping quality is not available.

Awnser to "Counting the n of unique mapped reads in STAR alignment BAM file"

If you look filteredfile.bamLog.final.out file obtained after STAR alignement the line "Uniquely mapped reads number" (x2 in case of paired end) is egal to :

samtools view -q 255 -c filteredfile.bam

Extra

-You can keep only unique mapped reads directly in STAR using the following option in STAR manual :

The default MAPQ=255 for the unique mappers maybe changed with --outSAMmapqUnique parameter (integer 0 to 255) to ensure compatibility with downstream tools such as GATK.

-About the -Q option of bcftools mpileup , the filter is different because it concern base quality :

-Q, --min-BQ INT skip bases with baseQ/BAQ smaller than INT [13]

  • Finaly if you want to use filter reads based on mapQ value with bcftools , you can use --outSAMmapqUnique option in manual during STAR alignement :

The default MAPQ=255 for the unique mappers maybe changed with --outSAMmapqUnique parameter (integer 0 to 255) to ensure compatibility with downstream tools such as GATK.

Adding --outSAMmapqUnique in STAR alignement then you can use the -q option to keep ony unique mapped reads :

-q, --min-MQ INT skip alignments with mapQ smaller than INT [0]

Finaly to set the right -q int value it will depends on the value you set for --outSAMmapqUnique (for more explaination you have the discussion here) or by default 255 to keep unique mapped reads :

bcftools mpileup -Ou -o Variant/L10A2.bcf -f Ref/GCF_000002315.5_GRCg6a_genomic.fna L10A2_sortedbycordinate_fix_marked.bam -q 255

Best

ADD COMMENT

Login before adding your answer.

Traffic: 3112 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6