Could anyone explain the difference between the options -s and -S for "samtools rmdup"? In addition, is it a standard to use -sS in order to remove duplicate reads?
I recently tried to remove the duplicates in one Bam file. After running the command line "samtools rmdup -sS in.nameSrt.bam out .bam", the size of Bam file decreased from 11G to 5.2G and the log showed that there were 52.48% reads that had been removed. I'm really worried about the massive amount of data loss.
By the way, one of my goal in the downstream analysis is to call genotypes and detect SNPs.