Hi everyone,
I have a question regarding the BWA alt-aware pipeline since I haven't seen much guideline online other than https://gatk.broadinstitute.org/hc/en-us/articles/360037498992--How-to-Map-reads-to-a-reference-with-alternate-contigs-like-GRCH38#2 (though it might a little bit dated).
Currently, my alt-unaware pipeline would be:
bwa mem | samtools fixmate | samtools sort | samtools markdup
Since my region of interest is the MHC, at first, I tried to pipe bwa-postalt.js (https://github.com/lh3/bwa/tree/master/bwakit) after samtools markdup
but it resulted in an error in subsequent samtools sort
step (the SAM file will be read-name-sorted after using the js script).
bwa mem | samtools fixmate | samtools sort | samtools markdup | k8 bwa-postalt.js | samtools sort
After a few tries, I decided to break the pipe into two commands and it ran without errors but it would create an intermediate sam file (the input and output of bwa-postalt.js is in SAM format).
bwa mem | samtools fixmate | samtools sort | samtools markdup
k8 bwa-postalt.js | samtools sort
Another pipeline I also tried was to do post-alt processing step after mapping with bwa mem
then progress with the rest of the pipeline (adaptation from https://pmc.ncbi.nlm.nih.gov/articles/PMC5522380/).
bwa mem | k8 bwa-postalt.js | samtools fixmate | samtools sort | samtools markdup
So, my question is which pipeline should be used in my analysis since the BAM files created by these two pipelines were not identical though the number of duplicated reads in markdup stats file from both pipelines were roughly the same.
Two-command pipeline:
EXCLUDED: 1174034
EXAMINED: 4792305
DUPLICATE PAIR: 2283688
One-command pipeline:
EXCLUDED: 38478395
EXAMINED: 4792304
DUPLICATE PAIR: 2283694
Since the VCF files generated through both pipelines with HaplotypeCaller produced different sets of variants, I'm having a hard time deciding which pipeline to follow through for my dataset. I'm currently inclining toward the one-command pipeline because it does not produce an intermediate file (and it feels more natural!), but any expert inputs on the matter is more than welcome.
Cheers.