Entering edit mode
5.9 years ago
VBer
▴
210
Hello.
I did I subset of 50 samples of a 192 sample VCF file. Some of the SNPs present in the new subset VCF are not present (i.e are 0/0) in all 50 samples. I would like to remove them, preferably using an existing tool.
I tried bcftools -e 'GT[0-49]="RR"' but that removes SNPs when even one sample is 0/0.
Thanks.
Take a look at my answer, here: A: How to get sample names and genotype for SNP in multi-sample VCF file
This will help you to identify sites that are completely homozygous reference (
nHomRefcolumn). You could then take those IDs and use them to filter the original data. Otherwise, indeed, there is likely some query that you can do with BCFtools or SnpSift.Have a look on this post.
Hey Aisha, thank you. Yes I did see that post earlier. I tried the bcftools suggestion and it didn't work. I have trouble installing Pierre's vcffilterjdk. I am yet to try SnpSift.