Question: Obtain one vcf file of shared SNPs from input files with different samples using vcf-isec (vcftools)
gravatar for weedy23
3.8 years ago by
weedy2370 wrote:

I am new to Linux and programming and am trying to use vcftools. I have 3 vcf files; each one is a different population (i.e. with no shared individuals between the files). I am trying to use vcf-isec to merge the 3 files and end up with one vcf file that contains only the SNPs that are present in all 3 files. I have tried the following code:

vcf-isec -n =3 file1.vcf.gz file2.vcf.gz file3.vcf.gz -f -c > CombinedPops.vcf

and without -c :

vcf-isec -n =3 file1.vcf.gz file2.vcf.gz file3.vcf.gz -f > CombinedPops.vcf

but I keep ending up with one file with only the individuals from the first input file. It also gives me a warning that "the number of sample columns is different", but I read in another post that -f forces vcf-isec to output the file regardless. Could this warning be why I can't get a file with ALL the individuals listed? Can vcf-isec even do this?

Although I have read the vcf-isec documentation, I am still not sure exactly what the difference between the -c and -o commands are, which may be part of my problem.

Any help is greatly appreciated!


vcftools vcf • 4.1k views
ADD COMMENTlink modified 3.8 years ago by venu6.3k • written 3.8 years ago by weedy2370
gravatar for venu
3.8 years ago by
venu6.3k wrote:

From vcftools documentation

vcf-isec -n +3 A.vcf.gz B.vcf.gz C.vcf.gz | bgzip -c > out.vcf.gz

Which gives a vcf file containing variants present in all the input vcf files (shared by all 3 VCF files). -f flag should be included to force the program to run over the different column name errors. On the other hand if you want to merge 3 vcf files into single vcf file use vcf-merge

I don't think this program of vcftools can separate SNPs, Indels but you can use vcf-annotate

zcate file.vcf.gz | vcf-annotate --fill-type | bgzip -c > out.vcf.gz 

This program includes variant TYPE field in the last column of your vcf file. Then create a new VCF file with SNPs. And finally I don't find any -o flag with these programs.

ADD COMMENTlink modified 3.8 years ago • written 3.8 years ago by venu6.3k

Hi venu, thanks for your help. The vcf-isec code you wrote is basically what I did, but I just specified exactly 3 files rather than 3 or more, and an uncompressed output instead. However, this gave me a file with only the individuals in the first file in it (although it did give me the loci found only in all three files). I have looked at vcf-merge but I think this produces a file with ALL loci? I only want the ones common to all the specified files. I don't have indels, my data is simple SNP data, but I will look further at vcf-annotate. The -o flag is mentioned here:, under "Read More".

ADD REPLYlink written 3.8 years ago by weedy2370

My bad. I edited. So you need SNPs shared by all 3 files (same chr#, position, base change ..etc)? but not as vcf-isec do?

ADD REPLYlink written 3.8 years ago by venu6.3k

Yep exactly.

ADD REPLYlink written 3.8 years ago by weedy2370

Hi weedy23, did you finally manage to solve the issue? I am having the same problem as you do...

ADD REPLYlink written 3.6 years ago by eze.anokian10

Hi, sorry I only just saw your comment. I ended up using vcf-merge instead. However, it included ALL loci in the output file, not just loci present in all the files. So I had to go through the output file and delete all the loci that were missing for one or more of the populations. Not ideal but it didn't take too long in the end if you sort the file. Good luck!

ADD REPLYlink modified 3.5 years ago • written 3.5 years ago by weedy2370

how to extract specific variants for A? following command is correct?

vcf-isec -c A.vcf.gz B.vcf.gz C.vcf.gz > specific_for_A.vcf

ADD REPLYlink written 2.3 years ago by reza210
Please log in to add an answer.


Use of this site constitutes acceptance of our User Agreement and Privacy Policy.
Powered by Biostar version 2.3.0
Traffic: 1678 users visited in the last hour