Question: comparing 3 VCF files for concordance and VENN plotting
0
gravatar for drowl1
14 months ago by
drowl110
drowl110 wrote:

Hello everyone!

I have 3 multi-sample VCF files that I want to compare for concordance such that I can use the statistics from the comparison to do a VENN diagram.I have used vcf-compare but the numbers I get from its output (vcf-compare a.vcf b.vcf c.vcf) do not agree with the pairwise comparison of my three vcf files using bcftools isec (ie,bcftools isec a.vcf b.vcf,bcftools isec a.vcf c.vcf,bcftools isec b.vcf c.vcf,).

Is there any other tool that can run this 3-vcf file comparison for me with understandable output statistics?

ADD COMMENTlink modified 14 months ago by Kevin Blighe37k • written 14 months ago by drowl110
2
gravatar for Kevin Blighe
14 months ago by
Kevin Blighe37k
Republic of Ireland
Kevin Blighe37k wrote:

Your experience with those tools used for comparing variants is why I gave up using them many years ago. They rarely agree and many bugs have been found. 'Under the hood', they may be doing some form of automatic filtering that is not reported, such as not including variants with low quality, etc.


I would just do this manually by(for each file):

1, Normalising the variants in each file (left-aligning indels and splitting multi-allelic calls), and then creating a unique key to identify each variant, setting it in the ID field for each resulting file (BCF):

#1st pipe, splits multi-allelic calls into separate variant calls
#2nd pipe, left-aligns indels and issues warnings when the REF base in your VCF does not match the base in the supplied FASTA reference genome
#3rd pipe, sets VCF ID field to CHROM:POS:REF:ALT

bcftools norm -m-any MyVariants.vcf | bcftools norm --check-ref w -f human_g1k_v37.fasta | bcftools annotate -Ob -x 'ID' -I +'%CHROM:%POS:%REF:%ALT' > MyVariants.bcf ;
bcftools index MyVariants.bcf ;

2, extracting the variant key into 3 separate lists using cut, awk, sed, or grep:

bcftools view MyVariants.bcf | cut -f3 | grep -v "^##" | grep -v "^ID"

3, With your 3 lists of unique variants, use something like Venny to check level of overlap and unique variants in each

ADD COMMENTlink modified 5 months ago • written 14 months ago by Kevin Blighe37k
2

why I gave up using them many years ago.

+1, and we don't know if people want to compare the genotypes, the CHROM/POS, CHROM/POS/REF, CHROM/POS/REF/ALT (and what about multiallelic alleles ) ?...

ADD REPLYlink written 14 months ago by Pierre Lindenbaum116k

Yep - lol

Happy Christmas! / Joyeux Noël!

ADD REPLYlink written 14 months ago by Kevin Blighe37k

Thank you Kevin!Now I know.Cheerful Christmas and Happy holidays!

ADD REPLYlink written 14 months ago by drowl110
Please log in to add an answer.

Help
Access

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.
Powered by Biostar version 2.3.0
Traffic: 1888 users visited in the last hour