comparing 3 VCF files for concordance and VENN plotting
1
0
Entering edit mode
6.3 years ago
drowl1 ▴ 30

Hello everyone!

I have 3 multi-sample VCF files that I want to compare for concordance such that I can use the statistics from the comparison to do a VENN diagram.I have used vcf-compare but the numbers I get from its output (vcf-compare a.vcf b.vcf c.vcf) do not agree with the pairwise comparison of my three vcf files using bcftools isec (ie,bcftools isec a.vcf b.vcf,bcftools isec a.vcf c.vcf,bcftools isec b.vcf c.vcf,).

Is there any other tool that can run this 3-vcf file comparison for me with understandable output statistics?

software error sequence next-gen sequencing • 4.4k views
ADD COMMENT
5
Entering edit mode
6.3 years ago

Your experience with those tools used for comparing variants is why I gave up using them many years ago. They rarely agree and many bugs have been found. 'Under the hood', they may be doing some form of automatic filtering that is not reported, such as not including variants with low quality, etc.


I would just do this manually by(for each file):

1, Normalising the variants in each file (left-aligning indels and splitting multi-allelic calls), and then creating a unique key to identify each variant, setting it in the ID field for each resulting file (BCF):

#1st pipe, splits multi-allelic calls into separate variant calls
#2nd pipe, left-aligns indels and issues warnings when the REF base in your VCF does not match the base in the supplied FASTA reference genome
#3rd pipe, sets VCF ID field to CHROM:POS:REF:ALT

bcftools norm -m-any MyVariants.vcf | bcftools norm --check-ref w -f human_g1k_v37.fasta | bcftools annotate -Ob -x 'ID' -I +'%CHROM:%POS:%REF:%ALT' > MyVariants.bcf ;
bcftools index MyVariants.bcf ;

2, extracting the variant key into 3 separate lists using cut, awk, sed, or grep:

bcftools view MyVariants.bcf | cut -f3 | grep -v "^##" | grep -v "^ID"

3, With your 3 lists of unique variants, use something like Venny to check level of overlap and unique variants in each

ADD COMMENT
3
Entering edit mode

why I gave up using them many years ago.

+1, and we don't know if people want to compare the genotypes, the CHROM/POS, CHROM/POS/REF, CHROM/POS/REF/ALT (and what about multiallelic alleles ) ?...

ADD REPLY
1
Entering edit mode

Yep - lol

Happy Christmas! / Joyeux Noël!

ADD REPLY
1
Entering edit mode

Thank you Kevin!Now I know.Cheerful Christmas and Happy holidays!

ADD REPLY

Login before adding your answer.

Traffic: 1884 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6