Which threshold to be put in the MAPQ when doing mpileup of BCF tools on a STAR-generated BAM file ?
0
1
Entering edit mode
11 weeks ago
mohsamir2016 ▴ 20

Dear All Sorry for bothering. I am doing variant calling using BCFtools mpileup then call and I would like to add a filteration step in the command for mpileup in BCF tool, but really confused about which threshold to be used for a min mapping quality. In my BAM file produced from STAR, most reads have 255 value in the MAPQ field. Now I am confused which number is correct for counting the n of unique mapped reads in STAR alignment BAM file ? I have done this by two ways, which ends up in different results:

1. The number/% of unique mapped reads that one can get from the log.files output from STAR aligner
2. The number that I obtain when running (Most of my BAM file have the 255 value in the MAPQ filed (column 5)

samtools view -q 255 -c filteredfile.bam

So which one of these is the more accurate description of the unique mapped reads ? so that I can use 255 in the filter
so shall I set this to 255 ...

i am using the below command for mpileup in BCF tool

bcftools mpileup -Ou -o Variant/L10A2.bcf -f Ref/GCF_000002315.5_GRCg6a_genomic.fna L10A2_sortedbycordinate_fix_marked.bam -Q 20


Thanks

RNAseq STAR • 510 views
0
Entering edit mode

(Most of my BAM file have the 255 value in the MAPQ filed

MAPQ=255 => the mapping quality is meaningless.

MAPQ: MAPping Quality. It equals −10 log10 Pr{mapping position is wrong}, rounded to the nearest integer. A value 255 indicates that the mapping quality is not available.

0
Entering edit mode

Thanks Pierre, This is understandable,. This is meaningless because these are uniquely mapped reads so there is no chance that there MAPQ equal a number. Could you comment on the question: which number to be put in the calling for variant using the BCF tools Could you comment ATpoint

Thanks

0
Entering edit mode

This post is very explanatory. As a "raw" filter, in the past I have seen many people using "30". However, you can be more strict if you want, depending on your goal. For variant calling, I personally use a combination of filters including depth, number of reads supporting the alternate allele.. etc.

0
Entering edit mode

Thanks Iran, however I do not think 30 is the best because as I mentioned above when running STAR aligner, and output file that only contains mapped reads, around 80% of the reads will have a MAPQ of 255, which as Pierre said is meaningless. So If I put the threshold for 30, it make no sense there will be no selection ? I went into that post you made, and it tells very clear;y that 255 is the MAPQ of the uniquely mapped reads.

By the way, do you know if one can filter the reads in the VCF file for those that are called from reads with certain MAPQ ? I mean filtering the VCF later for the MAPQ rather than doing it during the mpileup step ?

Best

0
Entering edit mode

Well, the point is not to discard as many reads as possible, but to keep the ones that are more likely to be correctly mapped. Therefore, the more reads with 255, the better.

Usually in many VCF you can find an output annotation for each variant with the mean mapping quality of the reads in the given position.