Getting frequency of sites fixed within the sample (i.e. divergence sites) from VCF file
2
1
Entering edit mode
3.7 years ago
JGuVa ▴ 10

Hi there,

I am trying to extract fixed sites within the sample from a VCF file. By fixed sites, I mean those that differ from the reference genome but that are fixed within the sample.

    REF    ALT ind_1   ind_2  ind_3
1    A      C    1/1    1/1     1/1
2    G      T    1/1    0/1     0/0
3    C      G    1/1    1/1     1/0
4    G      C    0/1    1/1     1/0
5    A      G    1/1    1/1     1/1


For instance, this is was a simplified version of a VCF file. In this case, sites 1 and 5 belong to this category of sites that contribute to divergence. Is there any tool on vcftools or R package that I can use for this purpose?

next-gen SNP sequence • 1.0k views
0
Entering edit mode

not clear to me. You want the variants where all the genotypes are homozygous for the ALT allele ?

0
Entering edit mode

Yes, exactly, that is what I need.

0
Entering edit mode

In addition to Pierre: Is your data in this simplified format or a normal vcf?

0
Entering edit mode

My file is a normal VCF, I presented it like that just for the sake of the explanation.

0
Entering edit mode
3.7 years ago

using vcffilterjdk http://lindenb.github.io/jvarkit/VcfFilterJdk.html

java -jar dist/vcffilterjdk.jar -e 'return variant.getGenotypes().stream().allMatch(G->G.isHomVar());' in.vcf

0
Entering edit mode
3.7 years ago

Using bcftools:

$bcftools view -i 'COUNT(GT="AA")=N_SAMPLES' input.vcf  or $ bcftools view -e 'GT[*]!="AA"' input.vcf


fin swimmer