How to remove homozygous reference genotypes from multi-sample vcf file based on a threshold
1
0
Entering edit mode
13 months ago
kk.mahsa ▴ 140

I have two question about homozygous reference genotypes in multi-sample vcf file.

  1. when I see 0/0 genotype for one or several samples in a multi-sample vcf file, does that mean there is no variant identified for that sample/samples?

  2. How can I remove a CNV (outputted from CNVcaller in vcf format) if more than or equal to 50% of the samples have genotype 0/0 for that position?

CNV VCF Genotype • 1.2k views
ADD COMMENT
0
Entering edit mode
13 months ago
cmdcolin ★ 3.8k

bcftools may be able to help https://samtools.github.io/bcftools/bcftools.html

it has the "F_PASS" filter expression, and you can filter based on different variant types. they list a couple

they list some examples of what you can filter on in their docs

sample genotype: reference (haploid or diploid), alternate (hom or het, haploid or diploid), missing genotype, homozygous, heterozygous, haploid, ref-ref hom, alt-alt hom, ref-alt het, alt-alt het, haploid ref, haploid alt (case-insensitive)

GT="ref"
GT="alt"
GT="mis"
GT="hom"
GT="het"
GT="hap"
GT="RR"
GT="AA"
GT="RA" or GT="AR"
GT="Aa" or GT="aA"
GT="R"
GT="A"

possibly this command can filter based on fraction

bcftools view -i 'F_PASS(GT="ref")<0.5'  in.vcf

https://samtools.github.io/bcftools/bcftools.html

when I see 0/0 genotype for one or several samples in a multi-sample vcf file, does that mean there is no variant identified for that sample/samples?

0/0 means both alleles match the 'reference', so yes, no variant identified for that sample/samples. 0/1 would be that one allele had the variant matching the first entry from ALT. 0/2 would be one allele had a variant match the second entry from ALT.

ADD COMMENT
0
Entering edit mode

Format of My VCF file

#CHROM  POS ID  REF ALT QUAL    FILTER  INFO    FORMAT  180 181 182 183 184 185 186 187 188 189
Chr1    801 NC_009849.1:801-16000   A   CNH .   .   END=16000;SVTYPE=DUP;SILHOUETTESCORE=nan;CALINSKIHARABAZESCORE=nan;LOGLIKELIHOOD=2.594363252834099  GT:CP   1/1:214.78  1/1:107.98  1/1:450.5   1/1:99.74   1/1:339.06  1/1:193.58  1/1:56.78   1/1:337.92  1/1:373.86  1/1:411.36

Chr1    337601  NC_044511.1:337601-339600   A   CN0 .   .   END=339600;SVTYPE=DEL;SILHOUETTESCORE=0.20963770717029737;CALINSKIHARABAZESCORE=539.874999999999;LOGLIKELIHOOD=3.26433913622163 GT:CP   1/1:0.5 0/1:1.24    0/1:0.96    0/1:0.66    0/0:1.56    0/1:1.44    0/1:1.4 0/0:1.8 0/0:2.28    0/0:2.36
ADD REPLY
0
Entering edit mode

is there anything about this you want me to check?

ADD REPLY
0
Entering edit mode

Yes, dear cmdcolin; Does the command you suggested work on this file or does it need to be modified? Sorry if my questions are elementary

Thanks

ADD REPLY
0
Entering edit mode

as far as I can tell, should be fine. i am not a bcftools expert but I'd just try it out and experiment, the filtering expressions are quite powerful with it

ADD REPLY

Login before adding your answer.

Traffic: 1833 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6