Question: Taking the difference of two VCFs (or removing singletons)
gravatar for hermathena
4.7 years ago by
United Kingdom
hermathena40 wrote:

Dear All,

Is there a way to take a difference of two VCF files? GATK can be used to take a Union or an Intersection, but I need the difference. There are two applications:

1. remove singletons. I have a VCF of all the SNPs and a VCF of the private ones. I need a VCF with the non-private SNPs.

2. get the non-CDS sequence. I can make a VCF of the exome SNPs by filtering against a gff file. I would also like to get the SNPs from non-CDS regions - which could be the difference of all SNPs and exome SNPs.

Any ideas, please?


Many thanks,

Krzysztof Kozak


University of Cambridge

ADD COMMENTlink modified 4.6 years ago • written 4.7 years ago by hermathena40


Thank you all for the suggestions, this looks promising!



ADD REPLYlink written 4.6 years ago by hermathena40
gravatar for Devon Ryan
4.7 years ago by
Devon Ryan90k
Freiburg, Germany
Devon Ryan90k wrote:

You can use either vcftools or bcftools. You'll just use the isec command with the -C (complement) option. Note that this is position based rather than exact variant based.

ADD COMMENTlink written 4.7 years ago by Devon Ryan90k
gravatar for Kizuna
4.7 years ago by
France, Paris
Kizuna760 wrote:

Regarding point 1.

I think you can do it with R.

try to transform your 2 vcf files into Dataframes (DF1 and DF2) and then subset the content of your chromosomic position of the DF2 containing the private variants from the one having all variants (DF1)

this is an example:

singletons.DF1<-DF1[!(DF1$chromosomic.position %in% DF2$chromosomic.positon),]
ADD COMMENTlink modified 18 months ago by RamRS21k • written 4.7 years ago by Kizuna760

The VariantAnnotation package contains a VRanges class that extends GRanges and would be convenient in this instance.

ADD REPLYlink modified 4.7 years ago • written 4.7 years ago by Devon Ryan90k
gravatar for Pierre Lindenbaum
4.7 years ago by
France/Nantes/Institut du Thorax - INSERM UMR1087
Pierre Lindenbaum120k wrote:

I wrote a tool to include/exclude the variants in a VCF file:

ADD COMMENTlink written 4.7 years ago by Pierre Lindenbaum120k
Please log in to add an answer.


Use of this site constitutes acceptance of our User Agreement and Privacy Policy.
Powered by Biostar version 2.3.0
Traffic: 1246 users visited in the last hour