I'm using SNPEff to get some statistics, and usually, I run it with the -t (multi-threaded) so I don't get the summary file. But this time I decided to check if my results will match that of the summary html. Apparently, it doesn't.
For example, running:
cat <annotated_vcf> | java -jar /storage3/users/kpalis/snpEff/SnpSift.jar filter "( EFF[*].EFFECT = 'NON_SYNONYMOUS_CODING' )" | wc -l
outputs 3,607,407
while in the summary file it says that NON_SYNONYMOUS CODING = 3,899,651
I've triple-checked my annotated vcf and it looks okay (same number of lines as the source vcf).
Any idea why these results are different? As per my observation, only the INTERGENIC count using snpsift and the summary html matches.
Thanks!
PS: I'm quite new to Bioinformatics and coming from a Computer Science background, so it's likely that I might be missing something here. Any help is greatly appreciated.