Say I have a BAM file which I want to filter, I want to apply the following processing steps: Remove reads aligned to blacklist regions, remove mitochondrial mapping reads, remove unmapped reads, remove pcr duplicates, remove reads with MAPQ score lower than 30.
Is there a 'correct' order to do these in, or does it not matter? I think problems could arise where certain filtering steps create orphans, resulting in incorrect SAM flags, which later software require to be accurate.
Most of the variant callers don't consider reads that are flagged as pcr duplicates, secondary alignments, unmapped reads (no brainer). So you need not to worry about their presence in the BAM file provided that these reads have been correctly flagged. MAPQ > 30 can be chosen if you really want to be stringent but as Brian suggested that it may incur some bias. There will be lots of reads with MAPQ < 30 but still align uniquely to the reference genome. So I would not use MAPQ to filter reads as variant caller will be smart enough to only use uniquely aligned reads. But these decisions are subjective and differ between people.