Hi everyone,
I am working on some environmental data, a metagenomic dataset that come from the Arctic ocean during winter.
Among this dataset I have extracted all sequences that belong to Bathycoccus prasinos and I have aligned them against the reference genome of this algae and then cretaed a vcf file.
Now, I am trying to annotate all the variant contained in this vcf file. I tried using snpEff, but I was wondering if anyone of you will know what software is more suitable for non human data? If anyone has an idea of what tool to use (and if possible why) for annotating variant for marine protist species it would be really cool :)
Thanks all, and have a good day.
Nathalie.
Well, generally I use SnpEff for both human and non-human species. I don't now if there is a built database for Bathycoccus prasinos inside SnpEff, if is not, you should build the database. But don't panic, is not complicated and it is very well explained in the manual. I've no idea if there is a specific tool for the annotation of marine protist species, but I don't think so.
Thanks a lot, I have already build my database for Bathycoccus. It is just that I have heard some whispers recently saying that this tool wasn't up to date. I agree that the part of the manual explaining how to build a database was well explained but regarding all the different categories of effect type, it is not as well explain, I think... Like I don't know what you think about that but the categories "upstream" and "downstream" doesn't mean anything for me, and some others like "none" are really not clear too...
I agree on that, the effect types and fields are a bit tricky at the beginning. Now I'm used to deal with SnpEff annotation format but I remember when I started. About the UPSTREAM/DOWNSTREAM categories, you probably know that a variant can affect multiple genes. For example, a variant can be located downstream from one gene and upstream from another gene, and this variant can affect them. If you are not interested in viewing this, you can avoid using
-no-downstream
and-no-upstream
options, or directly-ud 0
option, which sets the upstream-downstream interval to zero bases.Yes I saw this option, thanks a lot for your answers.
nathalie.
I'm not sure what you mean by "heard some whispers recently saying that this tool wasn't up to date", but I'm developing SnpEff and the latest release was just a couple of days ago (and there have been new versions being released every couple of months since its inception).
The categories, which are documented here are based on Sequence Ontology terms and the manual provides links to them so you can find out the definition (for instance "Upstream" is defined based on sequence ontology term "SO:0001631" which you can find here. Sequence Ontology terms are being used by most annotation tools nowadays (they are not particular to SnpEff) so even if you use another annotation tool you'll have the same terminology (at least the most "up to date" annotation tools use SO).
Thanks a lot for all these comments.