Question: Predicting Gene Ontology function in OMA for large datasets
gravatar for crl111222
3 months ago by
crl11122210 wrote:


So the thing is I need to predict the Gene ontology for a dataset of sequences, aproximately one and a half millon sequences. This size is too big for the Which would be the best way to use OMA for such a dataset?

omabrowser oma gene_onthology • 179 views
ADD COMMENTlink modified 3 months ago by adrian.altenhoff900 • written 3 months ago by crl11122210

not sure about the OMA approach (and this thus potentially not directly answering your question) but you can consider to run them through interproscan. That one will also assign GO labels to the input proteins, keep in mind though that running 1,5M proteins through interpro will also take a considerable amount of time.

ADD REPLYlink written 3 months ago by lieven.sterck10k

Yes, hi Lieven, thanks for your answer. I am aware of Interproscan. Currently running my sequences there too. And yes... sadly you are right, it is taking some time

ADD REPLYlink written 3 months ago by crl11122210
gravatar for adrian.altenhoff
3 months ago by
adrian.altenhoff900 wrote:

Hi @crl111222

after reading your question, we decided to increase the maximum size of the (gzip-compressed) fasta file for OMA's function prediction to 50MB for now and will increase it further in the future.

In case you observe any problem with it, please get in touch with us again.

Best wishes Adrian

ADD COMMENTlink written 3 months ago by adrian.altenhoff900

Thanks Adrian for you answer. I would like to ask something else. Is there any way to use the standalone version for this task? I believe that the standalone version, for gene ontology propagation, needs to download some annotated genomes. I have found that if I do this the results will not be as good as those obtained from the online version. Probably it must be because the online version has available a much much much bigger set of genome from which to compare and propagate the labels. Is there a way for the stand alone version to perform as well as the online version for this purpose?

ADD REPLYlink written 3 months ago by crl11122210

Indeed, the function prediction tool on the website uses the annotations of all the annotated protein sequences in OMA. If you use OMA standalone, you will only be able to use the annotations from the exported genomes. However, if your query species is covered quite well with the set of exported species OMA standalone should also work very well. The biggest difference will be that it then predicts functions from all annotated orthologous sequences, where as the function on the web predicts the annotations from the closest sequence.

ADD REPLYlink written 3 months ago by adrian.altenhoff900

Hi again doctor Altenhoff. I have been trying to upload gzip files of around 28 MB in size however I keep geeting an error that the file is too big (413 Request Entity too large)

ADD REPLYlink written 3 months ago by crl11122210

Hi, sorry about this. I forgot one instance where to change the settings. it should work now with files up to 50MB. Best Adrian

ADD REPLYlink written 3 months ago by adrian.altenhoff900

Sorry again. I was wondering if there is something not working on the website. Whenever I upload compressed files the status will be error, even with small compressed files. "Your dataset is currently being prepared. Its status is "error". Depending on the size of the uploaded dataset, this may take another couple of minutes.". The compressed files I am using have the extension .gz which I believe is the correct one to be uploaded.

ADD REPLYlink written 3 months ago by crl11122210

ups, you were right. there was a change in the API of one of the functions we used that no longer supports handling of gziped files. This should be solved now. Currently deploying the updated version, you should be able to finally use the gziped files in a couple of minutes. sorry for the problems this has caused.

ADD REPLYlink written 3 months ago by adrian.altenhoff900
Please log in to add an answer.


Use of this site constitutes acceptance of our User Agreement and Privacy Policy.
Powered by Biostar version 2.3.0
Traffic: 2313 users visited in the last hour