Hi, I'm using whole-exome sequencing (WES) for somatic variant calling. During the process, I tried to follow the approach described here: https://pubmed.ncbi.nlm.nih.gov/28420412/
Basically my workflow is as follows:
- FASTQ preprocessing: Using 2 aligners (BWA-MEM, Bowtie2)
- BAM calibration
- Variant calling: Using 3 software (Mutect2, Strelka2, Lancet)
- Variant filtering: I keep just the variants marked as 'PASS'
As you can see there are at least 6 VCFs per sample. I wonder how should I handle protocols like
- Merging VCFs by aligners and then intersecting by variant caller (either variant common in 2/3 or 3/3 software)
- Merging VCFs by aligners and then merging by variant caller
- Intersecting VCFs by aligners and then merging by variant caller
I have already used common tools to handle similar situations. For example, Strelka2 gave me two independent files with SNVs and Indels, so I had to use the concat tool from BCFTOOLS. Also, Lancet gave me SNVs and Indels in a single VCF but split by chromosomes, I used the MergeVCF from Picard to do this. Additionally, I used BCFTOOLS isec tool to check common variants detected by different variant callers.
Anyways, I'm afraid of getting the same variants being counted as different variants during the variant annotation. Like problems detailed in the figure 2 of this article https://pubmed.ncbi.nlm.nih.gov/30858580/
# Or is it better to annotate the 6 VCFs files from each of my samples and then filter somehow afterwards?
Additional: This is from NYGC Exome analysis pipeline v6
Next, the calls are merged by variant type (SNVs, Multi Nucleotide Variants (MNVs) and Indels). MuTect2 and Lancet call MNVs, however Strelka2 does not and it also does not provide any phasing information. So to merge such variants across callers, we first split the MNVs called by MuTect2 and Lancet to SNVs, and then merge the SNV callsets across the different callers. If the caller support for each SNV in an MNV is the same, we merge them back to MNVs. Otherwise those are represented as individual SNVs in the final callset. Lancet is the only tool that calls deletion-insertion (delins or COMPLEX) events. Other tools may represent the same event as separate indel and/or SNV variants. Such events are rare, especially in the exonic regions and difficult to merge. We therefore do not merge COMPLEX calls with SNVs and Indels calls from other callers.
# Is there some way to do this but include the aligner factor in my workflow?
Thanks for your time reading this I really will appreciate any kind of help