Question

Indel realignment: GATK vs. BCFtools

0

Entering edit mode

3.5 years ago

martytriggers • 0

Hello everyone,

I'm currently calling SNPs for multiple samples. I have done the pre-processing e.g. trimmed, mapped to reference (BWA mem), added readgroups, marked duplicates, and merged into BAMs. I'm now up to variant calling and have run bcftools mpileup/call until I realised I may have missed an important step.

I was provided a script to realign around indels using GenomeAnalysisTK.jar RealignerTargetCreator and IndelRealigner, using an older version of gatk (v3.4) BEFORE calling SNPs with bcftools. The issue (other than I did not perform this step before calling) is that these tools belong to an older version of gatk and my cluster (HPC) doesn't have the old versions. I'm aware that gatk's latest haplotype caller incorporates this but I plan on still using bcftools mpileup and call. I will also note that the reference genome I'm using is a "chromosome-level assembly" - I read somewhere on here that indel realignment is more important for lower quality sequencing.

So my questions are:

If I'm using bwa mem > bam > bcftools mpileup/call, is it still necessary to re-align around indels? If yes, I can ask the HPC to compile an older version of gatk unless someone recommends another software, since I would still like to use mpileup/call (will have to run again - ergh I hate wasting resources).
How is the gatk IndelRealigner different to either bcftools norm (normalising indels) or bcftools filter -i 'INDEL=0'? I noticed the bcftools manual suggests running norm after mpileup, but I'm not completely sure what the difference is between normalising indels and realigning them (as with gatk).

Thank you for your help in advance! Marty

sequence snp alignment • 1.6k views

ADD COMMENT • link updated 3.5 years ago by GenoMax 141k • written 3.5 years ago by martytriggers • 0