Question: Taking the difference of two VCFs (or removing singletons)
1
gravatar for hermathena
5.5 years ago by
hermathena40
United Kingdom
hermathena40 wrote:

Dear All,

Is there a way to take a difference of two VCF files? GATK can be used to take a Union or an Intersection, but I need the difference. There are two applications:

1. remove singletons. I have a VCF of all the SNPs and a VCF of the private ones. I need a VCF with the non-private SNPs.

2. get the non-CDS sequence. I can make a VCF of the exome SNPs by filtering against a gff file. I would also like to get the SNPs from non-CDS regions - which could be the difference of all SNPs and exome SNPs.

Any ideas, please?

 

Many thanks,

Krzysztof Kozak

Zoology

University of Cambridge

ADD COMMENTlink modified 5.5 years ago • written 5.5 years ago by hermathena40

Hello,

Thank you all for the suggestions, this looks promising!

Best,

Chris

ADD REPLYlink written 5.5 years ago by hermathena40
2
gravatar for Devon Ryan
5.5 years ago by
Devon Ryan94k
Freiburg, Germany
Devon Ryan94k wrote:

You can use either vcftools or bcftools. You'll just use the isec command with the -C (complement) option. Note that this is position based rather than exact variant based.

ADD COMMENTlink written 5.5 years ago by Devon Ryan94k
1
gravatar for Kizuna
5.5 years ago by
Kizuna790
France, Paris
Kizuna790 wrote:

Regarding point 1.

I think you can do it with R.

try to transform your 2 vcf files into Dataframes (DF1 and DF2) and then subset the content of your chromosomic position of the DF2 containing the private variants from the one having all variants (DF1)

this is an example:

DF1<-read.delim("....\allSNPs.vcf",header=T,sep="")
DF2<-read.delim("....\private.SNPs.vcf",header=T,sep="")
singletons.DF1<-DF1[!(DF1$chromosomic.position %in% DF2$chromosomic.positon),]
ADD COMMENTlink modified 2.4 years ago by RamRS26k • written 5.5 years ago by Kizuna790

The VariantAnnotation package contains a VRanges class that extends GRanges and would be convenient in this instance.

ADD REPLYlink modified 5.5 years ago • written 5.5 years ago by Devon Ryan94k
0
gravatar for Pierre Lindenbaum
5.5 years ago by
France/Nantes/Institut du Thorax - INSERM UMR1087
Pierre Lindenbaum127k wrote:

I wrote a tool to include/exclude the variants in a VCF file: https://github.com/lindenb/jvarkit/wiki/VcfIn

ADD COMMENTlink written 5.5 years ago by Pierre Lindenbaum127k
Please log in to add an answer.

Help
Access

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.
Powered by Biostar version 2.3.0
Traffic: 698 users visited in the last hour