I'm working on Bash scripts for a ChIPseq pipeline for my lab. Even though the ENCODE guideline suggests to remove duplicates, some people here want to not remove duplicates but filter the reads with certain MAPQ values. For this purpose, I am working on a script that does that.
On the Internet and some other people's scripts, all the examples I have seen so far are filtering SAM files (for example, with awk or grep, knowing that the MAPQ value is on the 5th column of the SAM file, it's not a challenge to extract this; let's say it becomes a simple file-management and text-editing problem). Nevertheless, in the pipeline I'm working on, the inputs come in the form of sorted BAMS (because there's another script in the lab that does the mapping, sorting and conversion to BAM).
So I was wondering, is there a way of doing this filtering of MAPQ=certain values from the sorted BAMs I got from the people, without having to ask them for the SAM files? Thank you!