Snpsift Regular Expression Assist
1
0
Entering edit mode
10.4 years ago
rob234king ▴ 610

I've successfully annotated a VCF file using snpEff but would like to filter the resulting VCF file using snpSift.

The VCF file is in the following format:

#CHROM    POS    ID    REF    ALT    QUAL    FILTER    INFO                            FORMAT                                                    Sample1                                        Sample2                                            Sample3
A       8725    .    C    T    .        PASS    ADP=99;WT=2;HET=1;HOM=0;NC=0    GT:GQ:SDP:DP:RD:AD:FREQ:PVAL:RBQ:ABQ:RDF:RDR:ADF:ADR    0/0:145:77:77:77:0:0%:1E0:37:0:41:36:0:0    0/1:0:58:58:55:3:5.17%:9.8E-1:36:33:25:30:3:0    0/0:285:163:162:161:1:0.62%:9.8E-1:36:39:78:83:1:0

I can filter the main columns e.g. INFO column but I don't know how to filter using the "Sample1 Sample2 Sample3" columns by, for instance, only taking vcf entries where one sample has a minimum FREQ of 5% and the other two samples less than 2%.

So far I have this but would like to add the other expressions I detailed above, any ideas how I can do this?

cat A.vcf | java -jar SnpSift.jar filter " ((NC = 0) & ( REF = 'C' ) & ( ALT = 'T')) " > A_filtered.vcf

• 3.6k views
ADD COMMENT
0
Entering edit mode
10.4 years ago
Pavel Senin ★ 1.9k

Will that work for something like this? (editing brackets, can't really test it)

cat input.vcf | java -jar SnpSift.jar filter \
 "((NC = 0) & ( REF = 'C' ) & ( ALT = 'T')) & ( \
  ( (FREQ[1]>5) & ((FREQ[2]<2) & (FREQ[3]<2 )) ) | \
  ( (FREQ[2]>5) & ((FREQ[1]<2) & (FREQ[3]<2 )) ) | \ 
  ( (FREQ[3]>5) & ((FREQ[1]<2) & (FREQ[2]<2 )) ) )" > output.vcf
ADD COMMENT
0
Entering edit mode

Thanks but doesn't work, doesn't recognise it as a field or Sample1. I tried adding the Sample1 to the vcf header so I could at least search for a string but still does not recognise it. I thought the program would look at the header to determine what fields are available. May have to just stick with a perl script but was hoping I could do this with existing tools and make it simplier for others.

ADD REPLY
0
Entering edit mode

What if instead of FREQ[1] you will try GEN[0].FREQ>5 and so on? For me it works when I use it for DP in INFO field of my file, I do not have a FREQ field unfortunately. I tried with your VCF example, but it throws an exception that Genotype numer '0' does not exists.

ADD REPLY

Login before adding your answer.

Traffic: 2189 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6