I have just received targeted deep sequencing fastq and bam files which have been generated on ion torrent platform. I am familiar with the GATK best practices pipeline, and have processed numerous whole exome and genomes. I would like to know if there are any key factors/pipeline differences that I need to be aware of, given the different platform of the new batch.
An exploratory look at one of my samples indicates very high rate of read duplicates (> 90%). I did expect a high rate of duplicates given the PCR nature of the experiment, but was still surprised by the above figure.
Currently, my pipeline performs the following steps in sequence:
1) indexing the bam file
2) reordering and sorting
3) fixing read mate information
4) de-duplication of reads [I wonder if this step should still be there]
5) generating realignment intervals, and realigning reads around indels
6) base quality score recalibration [I have doubts about this step, as on gatk page ion torrent is not explicitly listed as a supported platform http://goo.gl/DI93Ao]
The purpose of my analysis is to validate an initial set of mutation calls and look at variant allele frequency distribution.
I apologize if this post is a duplicate of an existing thread that I could not locate.