Taking the difference of two VCFs (or removing singletons)
3
1
Entering edit mode
9.6 years ago
hermathena ▴ 40

Dear All,

Is there a way to take a difference of two VCF files? GATK can be used to take a Union or an Intersection, but I need the difference. There are two applications:

  1. remove singletons. I have a VCF of all the SNPs and a VCF of the private ones. I need a VCF with the non-private SNPs.
  2. get the non-CDS sequence. I can make a VCF of the exome SNPs by filtering against a gff file. I would also like to get the SNPs from non-CDS regions - which could be the difference of all SNPs and exome SNPs.

Any ideas, please?

Many thanks,

Krzysztof Kozak
Zoology
University of Cambridge

genome sequencing singleton SNP filter • 3.7k views
ADD COMMENT
0
Entering edit mode

Hello,

Thank you all for the suggestions, this looks promising!

Best,

Chris

ADD REPLY
2
Entering edit mode
9.6 years ago

You can use either vcftools or bcftools. You'll just use the isec command with the -C (complement) option. Note that this is position based rather than exact variant based.

ADD COMMENT
1
Entering edit mode
9.6 years ago
Kizuna ▴ 870

Regarding point 1.

I think you can do it with R.

try to transform your 2 vcf files into Dataframes (DF1 and DF2) and then subset the content of your chromosomic position of the DF2 containing the private variants from the one having all variants (DF1)

this is an example:

DF1<-read.delim("....\allSNPs.vcf",header=T,sep="")
DF2<-read.delim("....\private.SNPs.vcf",header=T,sep="")
singletons.DF1<-DF1[!(DF1$chromosomic.position %in% DF2$chromosomic.positon),]
ADD COMMENT
0
Entering edit mode

The VariantAnnotation package contains a VRanges class that extends GRanges and would be convenient in this instance.

ADD REPLY
0
Entering edit mode
9.6 years ago

I wrote a tool to include/exclude the variants in a VCF file: https://github.com/lindenb/jvarkit/wiki/VcfIn

ADD COMMENT

Login before adding your answer.

Traffic: 2859 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6