How to filter VCF by the variant allele depth of a specific sample
3
0
Entering edit mode
8 days ago

Hello,

I'm fairly new to working with VCF files and linux, so apologies if this is a stupid question, but I am currently trying to filter a VCF by what variants a specific sample has a variant allele depth above 3 for. I've been able to use bcftools to filter variants by if any samples have a variant allele depth above three using the following command:

bcftools filter -o Outputad2.vcf -s 'lowAD' -i 'AD[0:1-] > 2' test.vcf.gz

But I don't know how to specify that to a specific sample.

Thank you

bcftools VCF • 209 views
ADD COMMENT
1
Entering edit mode
6 days ago

Thanks for the help guys, but I ended up figuring it out by using a gatk jexl expression

ADD COMMENT
0
Entering edit mode
7 days ago
sbstevenlee ▴ 140

If I understood your question correctly, you want to select only those variants where a specific sample has ALT AD of 3 or higher. If that's the case, here's a Python solution using the pyvcf submodule I wrote:

>>> from fuc import pyvcf
>>> data = {
...     'CHROM': ['chr1', 'chr1', 'chr1'],
...     'POS': [100, 101, 102],
...     'ID': ['.', '.', '.'],
...     'REF': ['G', 'T', 'A'],
...     'ALT': ['A', 'C', 'C'],
...     'QUAL': ['.', '.', '.'],
...     'FILTER': ['.', '.', '.'],
...     'INFO': ['.', '.', '.'],
...     'FORMAT': ['GT:AD', 'GT:AD', 'GT:AD'],
...     'A': ['0/1:15,8', '1/1:0,17', '0/0:15,0'],
...     'B': ['0/1:12,2', '0/1:23,3', '0/1:10,9']
... }
>>> vf = pyvcf.VcfFrame.from_dict([], data)
>>> # vf = pyvcf.VcfFrame.from_file('input.vcf')
>>> vf.df
  CHROM  POS ID REF ALT QUAL FILTER INFO FORMAT         A         B
0  chr1  100  .   G   A    .      .    .  GT:AD  0/1:15,8  0/1:12,2
1  chr1  101  .   T   C    .      .    .  GT:AD  1/1:0,17  0/1:23,3
2  chr1  102  .   A   C    .      .    .  GT:AD  0/0:15,0  0/1:10,9
>>> i = vf.extract('AD', func=lambda x: float(x.split(',')[1]))['B'] >= 3
>>> vf.df = vf.df[i]
>>> vf.df
  CHROM  POS ID REF ALT QUAL FILTER INFO FORMAT         A         B
0  chr1  101  .   T   C    .      .    .  GT:AD  1/1:0,17  0/1:23,3
1  chr1  102  .   A   C    .      .    .  GT:AD  0/0:15,0  0/1:10,9
>>> # vf.to_file('output.vcf')
ADD COMMENT
0
Entering edit mode
7 days ago

Using vcffilterjdk: http://lindenb.github.io/jvarkit/VcfFilterJdk.html

 java -jar dist/vcffilterjdk.jar -F LOW_AD_SAMPLE_S2 -e 'final int[] ad= variant.getGenotype("S2").getAD(); return ad!=null && ad.length>1 && Arrays.stream(ad).skip(1L).allMatch(AD->AD>2);' in.vcf.gz
ADD COMMENT

Login before adding your answer.

Traffic: 1776 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6