Question: Merge VCFs with overlapping samples
1
gravatar for Wan Shi Tong
28 days ago by
Wan Shi Tong60
Wan Shi Tong60 wrote:

I have VCFs that have some overlapping samples, is there a tool that can do this...

###VCF1:
CHR POS ID ALT REF QUAL INFO FILTER FORMAT Sample1 Sample2 Sample3  
SNP1...  
SNP2...  
SNP3...

###VCF2:
CHR POS ID ALT REF QUAL INFO FILTER FORMAT Sample2 Sample3 Sample4  
SNP2...  
SNP3...  
SNP4...

I WANT THIS...

###VCF1+VC2:
CHR POS ID ALT REF QUAL INFO FILTER FORMAT Sample1 Sample2 Sample3 Sample4  
SNP1... (missing for Sample4)  
SNP2...  
SNP3...  
SNP4... (missing for Sample1)

I DO NOT WANT THIS...

###VCF1+VCF2:
CHR POS ID ALT REF QUAL INFO FILTER FORMAT Sample1 Sample2 Sample3 Sample2_2 Sample3_2 Sample 4  
SNP1... (missing for Sample2_2, Sample3_2, and Sample4)  
SNP2...  
SNP3...  
SNP4... (missing for Sample1, Sample2, and Sample3)

In this example of what I do not want, Sample2 and Sample3 would only have SNP1, SNP2, and SNP3 and Sample2_2 and Sample3_2 would have SNP2, SNP3, SNP4.


Is there a tool that can merge VCFs and keep only one copy of each sample?

snp tools vcf • 179 views
ADD COMMENTlink modified 5 days ago by omg what am I doing...10 • written 28 days ago by Wan Shi Tong60

On face value, all that you require is bcftools merge. Pay close attention to the -m parameter, too. Missing genotypes will be represented as ./.

ADD REPLYlink written 28 days ago by Kevin Blighe41k

merge would want to have unique samples over vcfs, we could use --force-samples but then we get suffixes which OP doesn't want.

ADD REPLYlink written 26 days ago by zx87547.1k

Yeah, that is exactly my problem. vcf-merge and bcftools merge do not merge same samples. They create new entries for each repeated sample unfortunately.

ADD REPLYlink written 25 days ago by Wan Shi Tong60

Would be easier to split these back into individual VCFs and then run bcftools concat --allow-overlaps --remove-duplicates to concat the same samples into a single VCF, and then merge everything with bcftools merge. This will work, as I have done it before for this type of situation.

ADD REPLYlink modified 5 days ago • written 5 days ago by Kevin Blighe41k
1
gravatar for omg what am I doing...
5 days ago by
Penn State Hershey College of Medicine
omg what am I doing...10 wrote:

You need bcftools concat, I used the command below and got the result you described.

bcftools concat -a filtered_indels_annotated.vcf.gz filtered_snps_annotated.vcf.gz -Ov -o filtered_BC_merged.vcf

Some useful info here on the -a option: https://samtools.github.io/bcftools/bcftools.html#norm

ADD COMMENTlink modified 5 days ago • written 5 days ago by omg what am I doing...10

For concat to work we need all samples to overlap exactly.

All source files must have the same sample columns appearing in the same order.

ADD REPLYlink written 5 days ago by zx87547.1k
Please log in to add an answer.

Help
Access

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.
Powered by Biostar version 2.3.0
Traffic: 795 users visited in the last hour