How do I filter a multi-individual BCF file for genotype probabilities
1
0
Entering edit mode
4 weeks ago
devenvyas ▴ 680

I have BCFs with over a hundred individuals. I want to filter the files so that any genotype call with a max(GP) < 0.9 is removed. I don't want the whole site removed, I just want that individual genotype data removed for that site. I can't figure out how to do this without doing on each BCF individually.

Any suggestions?

vcf bcf • 215 views
ADD COMMENT
1
Entering edit mode

I believe that when you say you want to remove genotype data, you mean that you want to make it missing. BCFtools filter can help you with that. You can try using this command (try using the latest version from Github):

bcftools filter -i 'FMT/GP>0.9' --set-GTs . <input BCF file> > <output BCF file>

This would include all genotypes that have a GP > 0.9 and covert others to missing. This rule is applied to all individuals. Furthermore, you could also try filtering based on Genotype quality (GQ) which is phred scaled.

ADD REPLY
0
Entering edit mode

I don't want the whole site removed, I just want that individual genotype data removed for that site

it's not clear to me. Give us a short example of input/output.

ADD REPLY
0
Entering edit mode
4 weeks ago
4galaxy77 ▴ 680

bcftools +setGT test.vcf -- -t q -n . -e'FORMAT/GP>=0.90'

This should do what you need.

ADD COMMENT

Login before adding your answer.

Traffic: 2692 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6