Question: extracting snps from multiple vcf file but present in proportion of samples
0
gravatar for ankit hinsu
14 months ago by
ankit hinsu10
ankit hinsu10 wrote:

Hi,

I have vcf files of 25 samples (all of them prepared using freebayes with same reference). I want to extract SNPs which are present in at least 80% of samples (i.e. present in any 20 samples). Kindly help me with it.

I have tried "bcftools isec". It gives me output of those SNPs which are present in at least 20 samples (what I want). But whichever sample was inputted first in file list will be used as a reference. Because of these, only SNPs which are present in my first sample along with any other 19 samples are outputted (This is what I don't want). It should output SNPs present in any 20 samples.

Hope I have explained my problem clearly.

Ankit.

snp variant vcf • 546 views
ADD COMMENTlink modified 14 months ago by Pierre Lindenbaum128k • written 14 months ago by ankit hinsu10
0
gravatar for Pierre Lindenbaum
14 months ago by
France/Nantes/Institut du Thorax - INSERM UMR1087
Pierre Lindenbaum128k wrote:

using vcffilterjdk : http://lindenb.github.io/jvarkit/VcfFilterJdk.html

java -jar jvarkit-git/dist/vcffilterjdk.jar -e 'return variant.getGenotypes().stream().filter(G->!(G.isNoCall() || G.isHomRef())).count()>=20;' input.vcf
ADD COMMENTlink written 14 months ago by Pierre Lindenbaum128k

Thanks for reply...

I am guessing I need to merge all vcf file and then use this...

ADD REPLYlink written 14 months ago by ankit hinsu10
Please log in to add an answer.

Help
Access

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.
Powered by Biostar version 2.3.0
Traffic: 1941 users visited in the last hour