Question: gene based annotation : Database or live computation ?
gravatar for sacha
3.1 years ago by
sacha1.9k wrote:

There are several annotation database for annotation variants. Like dbNSFP, dbSNP, cosmic.... I was looking for a gene based annotation database which tell me the effect of variant . For exemple : Intron, exon, splice_site_donor, missens ... But I didn't find any database like that. Those fields depends on gene/transcript database, like refGene, UCSC gene, encode ... And it will generate huge database it we try to store each possibility .

So I assume annotator like UCSC, VEP or SnpEff compute those fields during the annotation process. Something like :

   def consequence(variant) : 
          for gene in refgene:
                  if variant in gene: 
                       if variant in gene.exons:
                           return "exons";
                      if variant in gene.introns:
                           return "introns"

So.. What's the strategy to make gene annotation with those fields. Database or live computation ?

annotation • 833 views
ADD COMMENTlink modified 3.1 years ago • written 3.1 years ago by sacha1.9k

I rather doubt there's a for gene in refgene sort of loop. More likely, the variant region is flanked by some reasonable amount and then that region queried in an interval tree or similar structure. The results can then be iterated over. Otherwise things would get really slow.

ADD REPLYlink written 3.1 years ago by Devon Ryan94k

Thx for your reply. That was an example . My question is whether it use a database or a computed methods?

ADD REPLYlink written 3.1 years ago by sacha1.9k

At least for snpEff, the methods section mentions the following:

This can be performed once the user has downloaded or built the database. The program loads the binary database and builds a data structure called “interval forest” in order to perform an efficient interval search. Input files are parsed and each variant queries the data structures to find intersecting genomic annotations. All intersecting genomic regions are reported and whenever these regions include an exon, the coding effect of the variant is calculated.

That indicates to me that it's doing the actual annotation live.

ADD REPLYlink written 3.1 years ago by Devon Ryan94k
Please log in to add an answer.


Use of this site constitutes acceptance of our User Agreement and Privacy Policy.
Powered by Biostar version 2.3.0
Traffic: 1553 users visited in the last hour