Is there a way to extract the reads with low mapping score from Bam files? Samtools will remove reads with low mapping score if we specify the threshold for option -q. However, what I want is to save those reads to another separate file instead of discarding them.
Thanks very much for your information.
Could I ask you a further question? In the foo.low_qual.sam file, there won't be headers from the original foo.bam file. If so, could I still use samtools to convert the foo.low_qual.sam file into foo.low_qual.bam in order to save space? Also, could I use samtools to generate pileup file from foo.low_qual.bam if there is no problem in converting .sam to .bam?
Thank you so much!
That was just an abbreviated example. You could include headers too:
samtools view -h foo.bam | awk '{if(NF<5) {print $0} else if($5<10) {print $0}}' | samtools view -Sbo foo.low_qual.bam -I should note that that's approximate, I haven't ensured that it lacks typos.
The NF<5 criterium does not work to extract the headers. We know that each line of the Sam file header starts with @ character, can I use this information to extract the headers? If so, could you write down the command? I googled but haven't found useful information.
Odd, that should normally suffice. You can also just:
samtools view -h foo.bam | awk '{if($1 ~ /^@/) {print $0} else if($5<10) {print $0}}' | samtools view -Sbo foo.low_qual.bam -shorter: awk '($0 ~ /^$@/ || int($5)<10)'
That actually works! Previously, I forgot to add the option -h.
Thanks a lot!
Hi Devon,
Could you explain what NF<5 means? Also, I think there is no difference using the two command line you provided. Is this correct? Obviously, I'd like to adopt the faster one.
NFis "the number of fields". Normally headers only have a small number of fields per-line.