number of functions is more than namber of variants in snpEff's output
4.5 years ago
reza ▴ 270

hi everyone

i annotated my vcf file (resulted from samtools) using snpEff and output of snpEff confused me. in html output the "number of SNPs" is 2.5 million while "number of effects" is 3.8 million. this case about indels is higher even, number of indels is 350,256 while "number of effects" is 9 million. what happened? is it normal result? if i want that number of variant and "number of effects" be equal, what should i do?

Hi ,

What do you mean by number of function ?

it is a part of snpEff results in html format.

enlight us please and answer Titus's question: Please show us an example of such "number of functions". What does it mean ? GO Terms ?

i am so sorry, it is "number of effects" not "number of function". the results is like this:

Number of lines (input file) 2,559,765

Number of variants (before filter) 2,560,952

Number of not variants (i.e. reference equals alternative) 0

Number of variants processed (i.e. after filter and non-variants) 2,560,952

Number of known variants (i.e. non-empty ID) 0 ( 0% )

Number of multi-allelic VCF entries (i.e. more than two alleles) 1,187

Number of effects 3,891,852

If i understand the question is why there is more effect than variant ? isn't it ? The think is you have multi transcripts for an unique gene than you the all transcript concerned by the variant.

yes my question is what you said. there is any way that variants and effects be equal?

If i remember well there is no option for that kind of output. You could do that if you have a list of transcript ( see this page http://snpeff.sourceforge.net/SnpEff_manual.html ). The only condition you need is no overlapping over yours positions transcripts. An other way is to use VEP quit similar to snpEFF ( http://www.ensembl.org/info/docs/tools/vep/index.html ) and which output variant annotation per line for different transcripts.

4.3 years ago
reza ▴ 270

i used longest transcript per gene for building database in snpeff but my problem is still not solved (number of effects are more than variants). This problem has puzzled me greatly, can someone help me to solve it?

3.3 years ago

Hello guys! was this problem resolved I am getting the same issue now.

That is not a problem. That is annotation. For eg. DMD gene has 10 transcripts resulting in 10 isoforms. Any sequence variation in DMD gene will have 10 calculated functional consequences in total, one per isoform. Effect calculations consider all the transcripts of the gene.

3.3 years ago

Hello,

I guess this cannot be solved easily.

What was already mentioned is, that you have multiple transcripts per genes. SnpEff annotates each of them unless one uses the -canon option. Than SnpEff only uses the canonical transcript.

But that's still not enough. You have region where different genes overlap the same region. What should SnpEff do here? Which gene should it choose for annotation? And it's very likely that the effect in one gene is different than in the other.

fin swimmer