How to get only polymorphic genotypes from combined vcf file filtering out homozygous, heterozygous and missing genotypes
0
0
Entering edit mode
4 months ago
analyst ▴ 50

Hi everyone,

I have a combined vcf file of multiple samples. I have to remove homozygous, heterozygous and missing genotypes and keep only polymorphic genotypes.

Kindly suggest which tool should i use to get desired output.

Thanks a lot!

polymorphic combinedVCF genotypes • 678 views
ADD COMMENT
1
Entering edit mode

You can parse the genotype (GT) field in the vcf file, see docs for vcf 4.1 here.

It would be as easy as only taking lines more than 3 different numbers appear in the GT field. You can easily do this in R with the vcfR package. There is also this post from a few years ago that offers other options including one from bcftools, among others.

ADD REPLY
0
Entering edit mode

Thanks dthorbor!

I think this command from the post only removes homozygous alleles not heterozygous alleles. What else should I add to remove heterozygous and missing alleles as well in below command

bcftools view -e 'COUNT(GT="AA")=N_SAMPLES || COUNT(GT="RR")=N_SAMPLES' input.vcf

Thanks a lot!

ADD REPLY
1
Entering edit mode

I'm unsure if there are any tools that do exactly as you want as your use case sounds fairly niche. As stated, it would be easy to parse a VCF if you have a basic grasp of the genotype field and can use R (or the PyVCF package if you use python).

ADD REPLY
0
Entering edit mode

In the previous post, there is another command that keeps sites with at least one nonref allele. Can I increase stringency of the criteria to keep sites with at least 40% polymorphic alleles by replacing 1 with 40?

bcftools view -c 1 input.vcf.gz -o output.vcf.gz -Oz

ADD REPLY

Login before adding your answer.

Traffic: 1231 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6