This might be an XY question, so I'll explain my premise:
- I have 3 VCF files,
f1is an annotated VCF covering 50 samples
f2is an annotated VCF covering 5 samples, but only sites that are not in
f3is an un-annotated VCF covering 5 samples across sites in
f1as well as not in
- All annotations are site-level
I now wish to get this as one VCF files with all sites annotated and all sample-level information present.
When I merge
f2, I get a VCF with all annotated sites and all samples, but for those sites overlapping with
GT/AD/... fields are empty, because that information is in
f3. How do I merge these three datasets?
In essence, can I do an operation to update genotype fields in one VCF file based on a sample+site match in another VCF file? If they were 2
data.frames, the operation would be something like
vcf1[site, sample] <- vcf2[site, sample].
The way I see it, I might have to subset That solution does not work as
f1-sites only, then
bcftools merge <f1> <f3_subset> ><f1_F3_subset> - that way I do not add any site, only samples. Then I
bcftools concat <f1+f3_subset> <f2> > <final_vcf>, so this time I add only sites, no samples. Any other solution will be appreciated.
bcftools concat cannot work on VCFs with different samples in them.