Question: Combining Maker and Blast2Go
0
gravatar for msobol
6 days ago by
msobol0
msobol0 wrote:

Hi,

I was wondering what the correct way was to get gene function from MAKER's outputs? I was hoping to upload the files to Blast2GO to assign functionality from Blast and Interproscan. The using Blast2GO, I wanted to create some visuals such as piecharts with gene family abundances, etc.

I tried doing this with both the .gff3 and protein.fasta outputs from Maker, but I could not get it to work in Blast2GO (It is very likely that I just do not know what I am doing).

Another source I found said to assign putative function using MAKER's post-analysis steps (http://weatherby.genetics.utah.edu/MAKER/wiki/index.php/MAKER_Tutorial_for_GMOD_Online_Training_2014#Post_Processing_of_Annotations)

Once you do this which file do you upload to Blast2GO and run Interproscan? The GFF, protein or transcript?

Any advice on this or other functional gene assignment will be greatly appreciated!!

Thanks, Morgan

fungi annotation genome • 135 views
ADD COMMENTlink modified 6 days ago by lieven.sterck1.4k • written 6 days ago by msobol0

What exactly is not working when doing Blast2GO and/or interproscan? What kind of errors do you get?

ADD REPLYlink written 6 days ago by lieven.sterck1.4k

Can you please give us more detail like, whether you are using blast2GO or blast2GOPro? Suggest you to try blastP of few protein sequences online/standalone and upload the resulting .xml file in blast2GOpro. If this works, then surely there is problem at blast level.

ADD REPLYlink written 5 days ago by toralmanvar290

I am using Blast2GO Pro. I ran BlastP on my protein.fasta file from MAKER and then uploaded the .xml to Blast2GO. I mapped and annotated the .xml file, but my GO annotations were low (picture link below). Now I am wondering if it would be better to blast the transcript.fasta file from MAKER and then uploading that output to Blast2GO. What do you think? I appreciate your advice! https://s9.postimg.cc/3l1p3zxb3/blast2go_statistics_20180517_1018.png

ADD REPLYlink modified 5 days ago • written 5 days ago by msobol0

NO, using the transcript file will not make any difference (would rather be worse actually).

The results of Blast2GO are rather low indeed. What kind of species are you working on? Perhaps your annotation result of Maker are of low quality resulting in the low performance of Blast2GO ?

ADD REPLYlink written 5 days ago by lieven.sterck1.4k

I am working with a deep sea sediment fungus "closely" related to Penicillium chrysogenum, so essentially it's a non-model organism (well from what I understand). It's possible that MAKER's results were low quality because I did not have RNA-seq data for this isolate.

For MAKER, I used BUSCO to create an Augustus species profile and incorporated that into MAKER. I also used GeneMark and incorporated that into MAKER.

Do you think I should download closely related protein and RNA data and incorporate that into MAKER to improve the gene ontology in Blast2GO?

Again, thanks so much!

ADD REPLYlink written 5 days ago by msobol0

OK,

yes, using RNAseq data will for sure improve the annotation result, but then you will need to use rna-seq from your own species and not from something closely related

I'm puzzled how you used BUSCO to create a training set for Augustus (if I understand correctly), otherwise your approach looks sound at first sight.

ADD REPLYlink written 5 days ago by lieven.sterck1.4k

Here is the pipeline I was using in case you were curious. CGP Pipeline on Github I was following the annotation pipeline made for not having RNA-seq data. The last step is using MAKER outputs to train Augustus, but I have been having issues with Augustus (link to my question on Biostars about this, I have also emailed the developer of Augustus twice but have not gotten a reply) Augustus Error So I was trying to see what I could find just with MAKER protein and gff files, and so hence why I am here at this point having issues. I wonder if adding a protein database into MAKER will help?

ADD REPLYlink modified 5 days ago • written 5 days ago by msobol0

You have got GOs for ~50% of your genes, which I don't find low. This is very much accepted number. I have worked on, one of the species of Penicillium and has got very similar results to yours. But there, I have not used 'maker' for gene prediction instead I had used GeneMark-ES. And if I remember correctly then It was very much comparable to published Penicillium genomes.

ADD REPLYlink written 4 days ago by toralmanvar290

perhaps I'm misinterpreting the graph , but does it not say there are only ~600 genes with a GO (or does the mapping bar also counts for GO?)

ADD REPLYlink written 4 days ago by lieven.sterck1.4k

I thought I should be concerned about the annotated GO's, which would be the blue bar. Out of the ~5,500 that were mapped, only 600 were annotated which to me seems low.

ADD REPLYlink written 4 days ago by msobol0

I properly checked my Penicillium blast2GO file, and result was as follows:

  • Total genes : 11.2 k
  • Without hits : 70
  • With blast hits : 5.8 k
  • With mapping : 40
  • With GO annotation : 5.2 k

Means from total of 11.k NR annotated data, 5.2 k genes were mapped and GO annotated also.

So, in your case, yes GO annotation is low. You can try re-running GO annotation on your data using blast2GO and can check, if it is improving or not.

ADD REPLYlink written 3 days ago by toralmanvar290
0
gravatar for lieven.sterck
6 days ago by
lieven.sterck1.4k
Belgium, Ghent, VIB
lieven.sterck1.4k wrote:

For both Blast2GO and Interproscan you need to provide protein fasta file as input.

Otherwise those are indeed the obvious tools to use for functional gene annotation.

ADD COMMENTlink modified 6 days ago • written 6 days ago by lieven.sterck1.4k
Please log in to add an answer.

Help
Access

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.
Powered by Biostar version 2.3.0
Traffic: 831 users visited in the last hour