I have RNA-seq data sequenced in Illumina platform.
I have run the quality control with FASTQC and indeed, I have detected duplicates. As I am going to use these sequence data to do the SNP calling, I must remove the duplicates.
Does anyone have any experience with this? what is the best way to remove the duplicates, before mapping or when I start with the SNP calling with gatk?
Also, what are the software suggested for this purpose.
Thanks a lot in advance.