Question: SNPEff (v.3.6c) summary statistics differ from SNPSift results
0
gravatar for kevin.palis
3.7 years ago by
Philippines
kevin.palis0 wrote:

I'm using SNPEff to get some statistics, and usually, I run it with the -t (multi-threaded) so I don't get the summary file. But this time I decided to check if my results will match that of the summary html. Apparently, it doesn't. 

For example, running:

cat <annotated_vcf> | java -jar /storage3/users/kpalis/snpEff/SnpSift.jar filter "( EFF[*].EFFECT = 'NON_SYNONYMOUS_CODING' )"  | wc -l

outputs 3,607,407

while in the summary file it says that NON_SYNONYMOUS CODING = 3,899,651

I've triple-checked my annotated vcf and it looks okay (same number of lines as the source vcf).

Any idea why these results are different? As per my observation, only the INTERGENIC count using snpsift and the summary html matches.

Thanks!

PS: I'm quite new to Bioinformatics and coming from a Computer Science background, so it's likely that I might be missing something here. Any help is greatly appreciated.

 

snp snpeff • 1.7k views
ADD COMMENTlink modified 3.7 years ago by Pierre Lindenbaum119k • written 3.7 years ago by kevin.palis0
0
gravatar for Pierre Lindenbaum
3.7 years ago by
France/Nantes/Institut du Thorax - INSERM UMR1087
Pierre Lindenbaum119k wrote:
> outputs 3,607,407
> while in the summary file it says that NON_SYNONYMOUS CODING = 3,899,651

You compared a number of lines vs a number of consequences. One variant can overlap many transcripts = many variants.

 

 

ADD COMMENTlink written 3.7 years ago by Pierre Lindenbaum119k
Please log in to add an answer.

Help
Access

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.
Powered by Biostar version 2.3.0
Traffic: 2035 users visited in the last hour