I was wondering what the correct way was to get gene function from MAKER's outputs? I was hoping to upload the files to Blast2GO to assign functionality from Blast and Interproscan. The using Blast2GO, I wanted to create some visuals such as piecharts with gene family abundances, etc.
I tried doing this with both the .gff3 and protein.fasta outputs from Maker, but I could not get it to work in Blast2GO (It is very likely that I just do not know what I am doing).
Another source I found said to assign putative function using MAKER's post-analysis steps (http://weatherby.genetics.utah.edu/MAKER/wiki/index.php/MAKER_Tutorial_for_GMOD_Online_Training_2014#Post_Processing_of_Annotations)
Once you do this which file do you upload to Blast2GO and run Interproscan? The GFF, protein or transcript?
Any advice on this or other functional gene assignment will be greatly appreciated!!
What exactly is not working when doing Blast2GO and/or interproscan? What kind of errors do you get?
Can you please give us more detail like, whether you are using blast2GO or blast2GOPro? Suggest you to try blastP of few protein sequences online/standalone and upload the resulting .xml file in blast2GOpro. If this works, then surely there is problem at blast level.
I am using Blast2GO Pro. I ran BlastP on my protein.fasta file from MAKER and then uploaded the .xml to Blast2GO. I mapped and annotated the .xml file, but my GO annotations were low (picture link below). Now I am wondering if it would be better to blast the transcript.fasta file from MAKER and then uploading that output to Blast2GO. What do you think? I appreciate your advice! https://s9.postimg.cc/3l1p3zxb3/blast2go_statistics_20180517_1018.png
NO, using the transcript file will not make any difference (would rather be worse actually).
The results of Blast2GO are rather low indeed. What kind of species are you working on? Perhaps your annotation result of Maker are of low quality resulting in the low performance of Blast2GO ?
I am working with a deep sea sediment fungus "closely" related to Penicillium chrysogenum, so essentially it's a non-model organism (well from what I understand). It's possible that MAKER's results were low quality because I did not have RNA-seq data for this isolate.
For MAKER, I used BUSCO to create an Augustus species profile and incorporated that into MAKER. I also used GeneMark and incorporated that into MAKER.
Do you think I should download closely related protein and RNA data and incorporate that into MAKER to improve the gene ontology in Blast2GO?
Again, thanks so much!
yes, using RNAseq data will for sure improve the annotation result, but then you will need to use rna-seq from your own species and not from something closely related
I'm puzzled how you used BUSCO to create a training set for Augustus (if I understand correctly), otherwise your approach looks sound at first sight.
Here is the pipeline I was using in case you were curious. CGP Pipeline on Github I was following the annotation pipeline made for not having RNA-seq data. The last step is using MAKER outputs to train Augustus, but I have been having issues with Augustus (link to my question on Biostars about this, I have also emailed the developer of Augustus twice but have not gotten a reply) Augustus Error So I was trying to see what I could find just with MAKER protein and gff files, and so hence why I am here at this point having issues. I wonder if adding a protein database into MAKER will help?
You have got GOs for ~50% of your genes, which I don't find low. This is very much accepted number. I have worked on, one of the species of Penicillium and has got very similar results to yours. But there, I have not used 'maker' for gene prediction instead I had used GeneMark-ES. And if I remember correctly then It was very much comparable to published Penicillium genomes.
perhaps I'm misinterpreting the graph , but does it not say there are only ~600 genes with a GO (or does the mapping bar also counts for GO?)
I thought I should be concerned about the annotated GO's, which would be the blue bar. Out of the ~5,500 that were mapped, only 600 were annotated which to me seems low.
I properly checked my Penicillium blast2GO file, and result was as follows:
Means from total of 11.k NR annotated data, 5.2 k genes were mapped and GO annotated also.
So, in your case, yes GO annotation is low. You can try re-running GO annotation on your data using blast2GO and can check, if it is improving or not.