Hey!
I have 6 replicates from an experiment, and I run a variant calling algorithm for each of them, which results in a VCF file for each replicate.
I now try to combine the six resulted VCF files into a single VCF file containing only the intersected variants. Meaning, only variants appearing in every replicate should be considered. For that reason, I used the following command:
bcftools isec A.vcf.gz B.vcf.gz C.vcf.gz D.vcf.gz E.vcf.gz F.vcf.gz --nfiles=6 -c all --output-type v > out_file.txt
Which output a single file containing all the variants that overlap in their position and reference among all replicates. Unfortunately, using
-c all
Implicates the variants in the out_file.txt don't necessarily have the same ALT allele. I tried using "none", "some" and "indels", but it doesn't solve my problem, since I have several variants in the same positions that are not considered as the same variant. For example, I have the following variants in the same position and chromosome:
C -> CT in replicates A, B, D
C -> CTTT in replicates C, F
C -> CTTTT in replicate E
I expected to get C->CT in the out_file.txt as it is the overlapping variant but I don't get it for some reason. Should I use a different tool? different parameters?
Thanks in advance!
Thanks for your reply!
First, all of the VCF files are already decomposed. Second, I tried bedtools intersect before and it is not more sensitive. There is only one factor that determines the reported "intersected variant": which file was given first in the intersection. Meaning, running the command:
Will result:
But running the command:
Will result:
But I believe the correct result for the overlapping variant should be C->CT, without considering which file was given first.
BTW, I checked out intervene and it looks awesome!