How To Remove Common Variants Present In Two Vcf Files?
5
1
Entering edit mode
10.6 years ago
Jordan ★ 1.3k

Hi,

I would like to remove variants common to both the vcf files. For e.g., if FileA.vcf and FileB.vcf are two files, then I would like remove common variants between these two files, and keep only the ones that are unique to FileA.vcf.

I have written a code for it, but I would like to confirm if what I have written is right or not. Is there a tool out there which does this already?

Thanks!

vcf variant • 7.7k views
ADD COMMENT
7
Entering edit mode
10.6 years ago
rob234king ▴ 610

You can do this using vcftools tools in two commands (isec) I think I have an example on website something similar, type in google: cubelp2 bioinformatics. Check vcftools website manual.

ADD COMMENT
1
Entering edit mode

Yes, the tool "out there" is VCFtools.

ADD REPLY
1
Entering edit mode
10.3 years ago
ben.bob ▴ 30

vcftools is perfect for this task but if you are interested in analyzing more than two samples, and do more complex analysis (e.g. case-only homozygotes with acceptable quality), you can import your data into variant tools varianttools.sourceforge.net) and use it to analyze the data (vtools import, vtools update --from_stat, vtools compare, and vtools export).

ADD COMMENT
0
Entering edit mode
10.6 years ago
alexej.knaus ▴ 130

You could also try GeneTalk: sign up and create an account, upload your data. You can create a collection from your data and filter it with several tools

Visit http://www.gene-talk.de

ADD COMMENT
0
Entering edit mode

You should at least add, that (as far as I understand from your website) this only works for human data.

ADD REPLY
0
Entering edit mode

ah yeah, thats correct! only VCF files that are hg19 referenced will work with GeneTalk...

ADD REPLY
0
Entering edit mode
10.6 years ago

I would also use vcftools for 1on1 comparison, but just to point out an idea let me describe a script we've developed internally. you could try indexing all common variant columns (chr, pos, rs, ref, var) by sample, loop through all the vcf files you have, and ultimately report all variants found (common columns) plus a sample list and/or a sample count. in a single step you will be recording all variants found plus their occurrences, which would allow you to filter those variants found in 1 sample only or in all of them, without caring about the number of samples you have to deal with (2, 20,...)

ADD COMMENT
0
Entering edit mode
10.3 years ago

BEDTools can also get you the intersection, or non-intersection of two vcf files.

ADD COMMENT

Login before adding your answer.

Traffic: 1507 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6