What is the MAPQ value of flag 512 for bam file?
2
0
Entering edit mode
2.1 years ago

Hello all, I'm doing the QC of bam files. I found there are 2 ways to remove the low mapping quality reads. One is to remove reads by Q<30 like samtools view -h -b -q 30 -U below_q30.bam aligned.bam The other is to remove by flags 512 like samtools view -F 512 -b sorted.bam > filtered_sorted.bam. Flag 512 represents the reads not passing the quality control when doing alignment by samtools. I'm wondering whether it's necessary to do both of these 2 steps. I mean, What is the MAPQ value of flag 512 ? If flag 512 has the same MAPQ value as 30, then there's no need to do both steps, right? Thanks! Best, YJ

samtools • 1.3k views
ADD COMMENT
2
Entering edit mode
2.1 years ago
aw7 ▴ 270

Flag 512 is defined as "not passing filters, such as platform/vendor quality controls". It generally is set upstream of using samtools. What counts as "not passing" depends on who or what is doing your sequencing, assembly/alignment and whatever post processing happens.

So the MAPQ and the flag 512 values are independent of each other. If your aligner treats Q<30 as poor quality then the 512 flag may be set, but it might not be. It depends on the aligner.

If you want to use flag 512 properly you need to know how your data was made and what filters it went through.

To directly answer your first question, I would screen on flag 512 and Q<30. They mean different things. You can also do it in one line

samtools view -@ 4 -q 30 -F 512 -h alignment.bam -o filtered_below_q30.bam

To answer your second question, all the flags are used by something but not everything uses all the flags. If your aligner only uses primary placements then the secondary and supplementary will not be set. If you never look for duplicates then the duplicate flag will never be set. There are no placeholder flags.

ADD COMMENT
1
Entering edit mode
2.1 years ago

The BAM flag for quality control is there as a placeholder, typically not used and is left unset.

It is just one of the many misdesigned features of BAM as a format.

Feel free to ignore that flag.

ADD COMMENT
0
Entering edit mode

Hello Istvan Albert , thanks for the response! Do you know which flags really work, which are just placeholders? I found that removing unmapped reads by F 4 works well. Whereas, F 512 and F 1024 are doing nothing. So, I should remove low-quality reads by -q 30, right? But how shall I remove the duplicated reads? BTW, is it necessary to do as much filtering as the below picture shows? Thanks in advance.

enter image description here

enter image description here

ADD REPLY
0
Entering edit mode

I would recommend reformulating the word "low quality" alignment into concepts that are more specific to alignments.

Low-quality is just a catch-all term that leaves you with little actionable information.

When you produce alignments you will get mapping quality, an alignment score, number of mismatches, alignment lengths, you also have average read quality, you can remove identical reads or identical alignments.

You can filter for any or all of the above if you wish and call that "low-quality".

But in the end, I would recommend some caution, it is not so clear when is an alignment "low-quality". In the vast majority of cases the reference genome is not the same as the true genome under study.

Filtering for low quality might just lead to filtering out the regions that make your genome unique and specific to the phenotype. More damage can be done by incorrect filtering data than not filtering at all.

ADD REPLY

Login before adding your answer.

Traffic: 1768 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6