Good afternoon, I'm working with GATK to do the SNP calling of some target capture sequencing data. Right now I'm creating the bam files and I was wondering which are the standard quality measures I should apply to my bam files. I'm aware that I can mark duplicates, but I don't know how this could affect the consequent analyses. Are they removed or they are just marked and I have to do something else? Should I specify some value scores for my bam files?
Thank you very much for your help!
Check GATK Best Practices where they address many of your questions.
Like in here ? I don't fully get GATK BEst Practices
They have a lot of different resources. You can also check their events pages, which do not sound like they would be relevant, but they post helpful presentations.