Getting frequency of sites fixed within the sample (i.e. divergence sites) from VCF file
2
1
Entering edit mode
3.7 years ago
JGuVa ▴ 10

Hi there,

I am trying to extract fixed sites within the sample from a VCF file. By fixed sites, I mean those that differ from the reference genome but that are fixed within the sample.

    REF    ALT ind_1   ind_2  ind_3
1    A      C    1/1    1/1     1/1
2    G      T    1/1    0/1     0/0
3    C      G    1/1    1/1     1/0
4    G      C    0/1    1/1     1/0
5    A      G    1/1    1/1     1/1

For instance, this is was a simplified version of a VCF file. In this case, sites 1 and 5 belong to this category of sites that contribute to divergence. Is there any tool on vcftools or R package that I can use for this purpose?

Thanks in advance.

next-gen SNP sequence • 1.0k views
ADD COMMENT
0
Entering edit mode

not clear to me. You want the variants where all the genotypes are homozygous for the ALT allele ?

ADD REPLY
0
Entering edit mode

Yes, exactly, that is what I need.

ADD REPLY
0
Entering edit mode

In addition to Pierre: Is your data in this simplified format or a normal vcf?

ADD REPLY
0
Entering edit mode

My file is a normal VCF, I presented it like that just for the sake of the explanation.

ADD REPLY
0
Entering edit mode
3.7 years ago

using vcffilterjdk http://lindenb.github.io/jvarkit/VcfFilterJdk.html

java -jar dist/vcffilterjdk.jar -e 'return variant.getGenotypes().stream().allMatch(G->G.isHomVar());' in.vcf
ADD COMMENT
0
Entering edit mode
3.7 years ago

Using bcftools:

$ bcftools view -i 'COUNT(GT="AA")=N_SAMPLES' input.vcf

or

$ bcftools view -e 'GT[*]!="AA"' input.vcf

fin swimmer

ADD COMMENT

Login before adding your answer.

Traffic: 1374 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6