So I have a set of tumor:matched normal samples. I have them deduped with picard for PCR contamination. Afterwards I use MuTect2 to call somatic variants against dbSNP, COSMIC coding mutation, COSMIC noncoding mutation. And for some reason about 10% of my reads are being filtered out as duplicates.
I suspect that these "duplicates" are not contaminants and was wondering what may be going on. Could it be rRNA that were not trimmed during pre-processing QC?