I'm looking for a "standardized" workflow to align and call low-frequency variants from 150bp illumina sequencing data in fastq format. We are currently using bowtie 1.0 (as indels are not a concern) and a custom algorithm to find only variants that have a phred score >20 and avg neighborhood phred of >20 and are present in >1% of the aligned reads for a particular base from the output MAP file. I would like to replace it with something more modern/industry standard.
The purpose is to identify point mutations in subcloning experiments for b-cells and the reference is the parental clone antibody sequence
For the preprocessing steps does using BWA and GATK make sense for this type of data or is perhaps bowtie2 or something else a better aligner to use for speed with such a small reference?
Otherwise, I scanned through some of the published gatk "best practices" and tried to extract the relevant steps, it looks like the below might be a good preprocessing pipeline but if anyone could comment that would be very much appreciated:
- bwa mem to align fastq to ref.fa and convert to sam
- SortSam with Picard (and convert to bam)
- MarkDuplicates with Picard
- call variants with HaplotypeCaller (Gatk)