I have manually created a customized database on NCBI+ tools on a command line centOS server. I have a blastx running against said database currently. My query being my transcriptome assembly.
My database consists of NR, uniprotKB and swissprot. Once my BlastX finishes and I obtain the gene information, E-values etc (I've selected
-outfmt 6); how do I go about obtaining GO classifications? My overall aim is to condense these GO terms with Plant GO SLIM afterwards, but I'm stuck on how to go from a blastx tabular formatted table to obtaining GO terms? Any help?
- I am not wanting to use Blast2GO due to time constraints and the fact you need to pay.
Additionally I wanted to do an IPS or PFAM classification, should I do this before the GO classification and combine both blastX results and protein results?
Considering your time issues with Blast2GO:
Have you tried to perform the blastx outside B2G with output format set to xml and then import the xml files? You can also import several xml files, if you want to run blastx with sub-sets of your sequences in parallel.
Considering the price issue:
I think the free version is already quite good.
Hmmm, I was hoping to learn how to do it via command line manually. I've requested another trial, could try.. but any more advice appreciated. Feeding XML files in from my stand alone blast is possible, BUT, how long will it take to provide Protein and GO classification for 70K genes.
I was running InterProScan locally with only a subset of the IPS applications and 46,645 transcripts, and it took quite a while. In this paper you can see my approach, if you are interested.
With that said, I have to admit that I worked with B2G and IPS in 2012 the last time, the tools may have changed considerably and there also may be a command line version for the required task nowadays.