VCF filtering
0
0
Entering edit mode
6 weeks ago
drowl1 ▴ 30

Hi everyone,

I have a multisample VCF file (30 samples) with haploid calls and I want to filter off all sites where this condition holds true;

'sum of ref (GT=0) and missing/uncalled (GT=".") genotypes in all samples is 30'

I have tried this in bcftools but it doesn't seem to work ;

bcftools view  -e '(sum(GT[*] =".") + sum(GT[*]="0")) == 30'  samples.vcf  >  filtered_samples.vcf

Please advise on how to correctly do it with bcftools or with any other approach.

Thanks!

vcf SNPfiltering genotype • 346 views
ADD COMMENT
1
Entering edit mode

I think slivar is pretty neat.

ADD REPLY
1
Entering edit mode

how about:

bcftools view --min-ac 1  in.vcf 
ADD REPLY
0
Entering edit mode

Hi Pierre,

Thanks for your suggestion. I have initially excluded all the homozygous REF sites (where GT = 0 across the 30 samples) as well as the homozygous ALT ( where GT = 1 across all the 30 samples) in two steps using bcftools.

I basically want to remain with heterozygous sites only, so from the above, I also want to follow that up by further filtering off sites where the genotypes are REF and uncalled/missing across all samples ( i. e GT = "0" + GT = "." ==30) and where genotypes are ALT and uncalled/missing across all samples ( i. e GT = "1" + GT = "." ==30).

It looks like the regex and arithmetic functions in bcftools do not work across samples so I'm stuck. Would you know how to work around this?

ADD REPLY
0
Entering edit mode

why are you talking about homozygous sites and heterozygous sites if those are "haploid calls" ??

ADD REPLY
0
Entering edit mode

Apologies for that typo. I'd like to get rid of sites that are ALT & uncalled/missing genotype across all samples, as well as those that are REF & uncalled/missing genotype across all samples. Such that the remaining sites have all genotypes (ALT, REF & missing ".") across all samples

ADD REPLY

Login before adding your answer.

Traffic: 1695 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6