Question: Predicting Gene Ontology function in OMA for large datasets
0
gravatar for crl111222
16 days ago by
crl1112220
crl1112220 wrote:

Hello,

So the thing is I need to predict the Gene ontology for a dataset of sequences, aproximately one and a half millon sequences. This size is too big for the https://omabrowser.org/oma/functions/. Which would be the best way to use OMA for such a dataset?

ADD COMMENTlink modified 15 days ago by adrian.altenhoff700 • written 16 days ago by crl1112220

not sure about the OMA approach (and this thus potentially not directly answering your question) but you can consider to run them through interproscan. That one will also assign GO labels to the input proteins, keep in mind though that running 1,5M proteins through interpro will also take a considerable amount of time.

ADD REPLYlink written 15 days ago by lieven.sterck9.0k

Yes, hi Lieven, thanks for your answer. I am aware of Interproscan. Currently running my sequences there too. And yes... sadly you are right, it is taking some time

ADD REPLYlink written 15 days ago by crl1112220
0
gravatar for adrian.altenhoff
15 days ago by
Switzerland
adrian.altenhoff700 wrote:

Hi @crl111222

after reading your question, we decided to increase the maximum size of the (gzip-compressed) fasta file for OMA's function prediction to 50MB for now and will increase it further in the future.

In case you observe any problem with it, please get in touch with us again.

Best wishes Adrian

ADD COMMENTlink written 15 days ago by adrian.altenhoff700

Thanks Adrian for you answer. I would like to ask something else. Is there any way to use the standalone version for this task? I believe that the standalone version, for gene ontology propagation, needs to download some annotated genomes. I have found that if I do this the results will not be as good as those obtained from the online version. Probably it must be because the online version has available a much much much bigger set of genome from which to compare and propagate the labels. Is there a way for the stand alone version to perform as well as the online version for this purpose?

ADD REPLYlink written 15 days ago by crl1112220

Indeed, the function prediction tool on the website uses the annotations of all the annotated protein sequences in OMA. If you use OMA standalone, you will only be able to use the annotations from the exported genomes. However, if your query species is covered quite well with the set of exported species OMA standalone should also work very well. The biggest difference will be that it then predicts functions from all annotated orthologous sequences, where as the function on the web predicts the annotations from the closest sequence.

ADD REPLYlink written 15 days ago by adrian.altenhoff700

Hi again doctor Altenhoff. I have been trying to upload gzip files of around 28 MB in size however I keep geeting an error that the file is too big (413 Request Entity too large)

ADD REPLYlink written 9 days ago by crl1112220

Hi, sorry about this. I forgot one instance where to change the settings. it should work now with files up to 50MB. Best Adrian

ADD REPLYlink written 9 days ago by adrian.altenhoff700

Sorry again. I was wondering if there is something not working on the website. Whenever I upload compressed files the status will be error, even with small compressed files. "Your dataset is currently being prepared. Its status is "error". Depending on the size of the uploaded dataset, this may take another couple of minutes.". The compressed files I am using have the extension .gz which I believe is the correct one to be uploaded.

ADD REPLYlink written 8 days ago by crl1112220

ups, you were right. there was a change in the API of one of the functions we used that no longer supports handling of gziped files. This should be solved now. Currently deploying the updated version, you should be able to finally use the gziped files in a couple of minutes. sorry for the problems this has caused.

ADD REPLYlink written 8 days ago by adrian.altenhoff700
Please log in to add an answer.

Help
Access

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.
Powered by Biostar version 2.3.0
Traffic: 2196 users visited in the last hour