Question: comparing 3 VCF files for concordance and VENN plotting
0
gravatar for drowl1
18 months ago by
drowl110
drowl110 wrote:

Hello everyone!

I have 3 multi-sample VCF files that I want to compare for concordance such that I can use the statistics from the comparison to do a VENN diagram.I have used vcf-compare but the numbers I get from its output (vcf-compare a.vcf b.vcf c.vcf) do not agree with the pairwise comparison of my three vcf files using bcftools isec (ie,bcftools isec a.vcf b.vcf,bcftools isec a.vcf c.vcf,bcftools isec b.vcf c.vcf,).

Is there any other tool that can run this 3-vcf file comparison for me with understandable output statistics?

ADD COMMENTlink modified 18 months ago by Kevin Blighe44k • written 18 months ago by drowl110
2
gravatar for Kevin Blighe
18 months ago by
Kevin Blighe44k
Kevin Blighe44k wrote:

Your experience with those tools used for comparing variants is why I gave up using them many years ago. They rarely agree and many bugs have been found. 'Under the hood', they may be doing some form of automatic filtering that is not reported, such as not including variants with low quality, etc.


I would just do this manually by(for each file):

1, Normalising the variants in each file (left-aligning indels and splitting multi-allelic calls), and then creating a unique key to identify each variant, setting it in the ID field for each resulting file (BCF):

#1st pipe, splits multi-allelic calls into separate variant calls
#2nd pipe, left-aligns indels and issues warnings when the REF base in your VCF does not match the base in the supplied FASTA reference genome
#3rd pipe, sets VCF ID field to CHROM:POS:REF:ALT

bcftools norm -m-any MyVariants.vcf | bcftools norm --check-ref w -f human_g1k_v37.fasta | bcftools annotate -Ob -x 'ID' -I +'%CHROM:%POS:%REF:%ALT' > MyVariants.bcf ;
bcftools index MyVariants.bcf ;

2, extracting the variant key into 3 separate lists using cut, awk, sed, or grep:

bcftools view MyVariants.bcf | cut -f3 | grep -v "^##" | grep -v "^ID"

3, With your 3 lists of unique variants, use something like Venny to check level of overlap and unique variants in each

ADD COMMENTlink modified 10 months ago • written 18 months ago by Kevin Blighe44k
2

why I gave up using them many years ago.

+1, and we don't know if people want to compare the genotypes, the CHROM/POS, CHROM/POS/REF, CHROM/POS/REF/ALT (and what about multiallelic alleles ) ?...

ADD REPLYlink written 18 months ago by Pierre Lindenbaum121k

Yep - lol

Happy Christmas! / Joyeux Noël!

ADD REPLYlink written 18 months ago by Kevin Blighe44k

Thank you Kevin!Now I know.Cheerful Christmas and Happy holidays!

ADD REPLYlink written 18 months ago by drowl110
Please log in to add an answer.

Help
Access

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.
Powered by Biostar version 2.3.0
Traffic: 1659 users visited in the last hour