Question: How To Remove Common Variants Present In Two Vcf Files?
1
gravatar for Jordan
5.7 years ago by
Jordan1.1k
Pittsburgh
Jordan1.1k wrote:

Hi,

I would like to remove variants common to both the vcf files. For e.g., if FileA.vcf and FileB.vcf are two files, then I would like remove common variants between these two files, and keep only the ones that are unique to FileA.vcf.

I have written a code for it, but I would like to confirm if what I have written is right or not. Is there a tool out there which does this already?

Thanks!

vcf variant • 4.8k views
ADD COMMENTlink modified 5.4 years ago by swbarnes25.5k • written 5.7 years ago by Jordan1.1k
7
gravatar for rob234king
5.7 years ago by
rob234king580
UK/Harpenden/Rothamsted Research
rob234king580 wrote:

You can do this using vcftools tools in two commands (isec) I think I have an example on website something similar, type in google: cubelp2 bioinformatics. Check vcftools website manual.

ADD COMMENTlink modified 5.7 years ago • written 5.7 years ago by rob234king580
1

Yes, the tool "out there" is VCFtools.

ADD REPLYlink modified 5.7 years ago • written 5.7 years ago by Neilfws48k
1
gravatar for ben.bob
5.4 years ago by
ben.bob30
ben.bob30 wrote:

vcftools is perfect for this task but if you are interested in analyzing more than two samples, and do more complex analysis (e.g. case-only homozygotes with acceptable quality), you can import your data into variant tools varianttools.sourceforge.net) and use it to analyze the data (vtools import, vtools update --from_stat, vtools compare, and vtools export).

ADD COMMENTlink written 5.4 years ago by ben.bob30
0
gravatar for alexej.knaus
5.7 years ago by
alexej.knaus120
Berlin
alexej.knaus120 wrote:

you could also try GeneTalk: sign up and create an account, upload your data. you can create a collection from your data and filter it with several tools

visit www.gene-talk.de

ADD COMMENTlink written 5.7 years ago by alexej.knaus120

You should at least add, that (as far as I understand from your website) this only works for human data.

ADD REPLYlink written 5.7 years ago by skymningen330

ah yeah, thats correct! only VCF files that are hg19 referenced will work with GeneTalk...

ADD REPLYlink written 5.7 years ago by alexej.knaus120
0
gravatar for Jorge Amigo
5.7 years ago by
Jorge Amigo11k
Santiago de Compostela, Spain
Jorge Amigo11k wrote:

I would also use vcftools for 1on1 comparison, but just to point out an idea let me describe a script we've developed internally. you could try indexing all common variant columns (chr, pos, rs, ref, var) by sample, loop through all the vcf files you have, and ultimately report all variants found (common columns) plus a sample list and/or a sample count. in a single step you will be recording all variants found plus their occurrences, which would allow you to filter those variants found in 1 sample only or in all of them, without caring about the number of samples you have to deal with (2, 20,...)

ADD COMMENTlink modified 5.7 years ago • written 5.7 years ago by Jorge Amigo11k
0
gravatar for swbarnes2
5.4 years ago by
swbarnes25.5k
United States
swbarnes25.5k wrote:

BEDTools can also get you the intersection, or non-intersection of two vcf files.

ADD COMMENTlink written 5.4 years ago by swbarnes25.5k
Please log in to add an answer.

Help
Access

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.
Powered by Biostar version 2.3.0
Traffic: 1727 users visited in the last hour