Entering edit mode
2.9 years ago
puddingmeow516
▴
10
Hi,
I have a VCF called by GATK with 150 samples, and have around 20k SNPs in total.
I want to filter the VCF, and only keep the 20 samples out of 150, and also only keep the SNPs for these 20 samples. This means I only want to extract the SNPs of 20 samples from a VCF called from 150 samples. I tried with VCF tools --keep, as well as Bcftools view -S, they both can exclude the other 130 samples from the VCF, but the number of SNPs remain exactly the same...
Is there any solution to exclude samples as well as the SNPs called from them?
Many thanks!
If there are multiple sites with missing/unknown GT (
./.
) for the samples that remain, you may want to add an expression with with thebcftools view -S
or pipe the output of theview -S
to aview -e 'COUNT(GT="mis" =20)'
to exclude sites where all 20 GTs are missing.