Question: Snpsift Regular Expression Assist
0
gravatar for rob234king
3.9 years ago by
rob234king530
UK/Harpenden/Rothamsted Research
rob234king530 wrote:

I've successfully annotated a VCF file using snpEff but would like to filter the resulting VCF file using snpSift.

The VCF file is in the following format:

#CHROM    POS    ID    REF    ALT    QUAL    FILTER    INFO                            FORMAT                                                    Sample1                                        Sample2                                            Sample3
A       8725    .    C    T    .        PASS    ADP=99;WT=2;HET=1;HOM=0;NC=0    GT:GQ:SDP:DP:RD:AD:FREQ:PVAL:RBQ:ABQ:RDF:RDR:ADF:ADR    0/0:145:77:77:77:0:0%:1E0:37:0:41:36:0:0    0/1:0:58:58:55:3:5.17%:9.8E-1:36:33:25:30:3:0    0/0:285:163:162:161:1:0.62%:9.8E-1:36:39:78:83:1:0

I can filter the main columns e.g. INFO column but I don't know how to filter using the "Sample1 Sample2 Sample3" columns by, for instance, only taking vcf entries where one sample has a minimum FREQ of 5% and the other two samples less than 2%.

So far I have this but would like to add the other expressions I detailed above, any ideas how I can do this?

cat A.vcf | java -jar SnpSift.jar filter " ((NC = 0) & ( REF = 'C' ) & ( ALT = 'T')) " > A_filtered.vcf

• 1.8k views
ADD COMMENTlink modified 3.9 years ago by Pavel Senin1.8k • written 3.9 years ago by rob234king530
0
gravatar for Pavel Senin
3.9 years ago by
Pavel Senin1.8k
Los Alamos, NM
Pavel Senin1.8k wrote:

Will that work for something like this? (editing brackets, can't really test it)

cat input.vcf | java -jar SnpSift.jar filter \
 "((NC = 0) & ( REF = 'C' ) & ( ALT = 'T')) & ( \
  ( (FREQ[1]>5) & ((FREQ[2]<2) & (FREQ[3]<2 )) ) | \
  ( (FREQ[2]>5) & ((FREQ[1]<2) & (FREQ[3]<2 )) ) | \ 
  ( (FREQ[3]>5) & ((FREQ[1]<2) & (FREQ[2]<2 )) ) )" > output.vcf
ADD COMMENTlink modified 3.9 years ago • written 3.9 years ago by Pavel Senin1.8k

Thanks but doesn't work, doesn't recognise it as a field or Sample1. I tried adding the Sample1 to the vcf header so I could at least search for a string but still does not recognise it. I thought the program would look at the header to determine what fields are available. May have to just stick with a perl script but was hoping I could do this with existing tools and make it simplier for others.

ADD REPLYlink written 3.9 years ago by rob234king530

What if instead of FREQ[1] you will try GEN[0].FREQ>5 and so on? For me it works when I use it for DP in INFO field of my file, I do not have a FREQ field unfortunately. I tried with your VCF example, but it throws an exception that Genotype numer '0' does not exists.

ADD REPLYlink written 3.9 years ago by Pavel Senin1.8k
Please log in to add an answer.

Help
Access

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.
Powered by Biostar version 2.3.0
Traffic: 1364 users visited in the last hour