Question: Basic question about SWISSPROT
gravatar for luzglongoria
9 days ago by
luzglongoria20 wrote:

Hi there,

I have been working on my transcriptome assembly and now it is time to start doing functional annotation.

I have read that a way to do it is using SWISSPROT. The command that I have found is something like:

blastx -db ~/shared_ro/dbs/
-query Trinity.fasta -num_threads 2 \
 -max_target_seqs 1 -outfmt 6 -evalue 1e-5 \
> swissprot.blastx.outfmt6

However, and probably due to I am not an expert in this field, I don't know where to find the file. I guess it is the database that SWISSPROT uses (or maybe not). But I don't know if I need to download it or it already installed when I installed Trinity since I did my transcript whit Trinity and I am following the commands in here:

Thank you so much in advance

ADD COMMENTlink written 9 days ago by luzglongoria20

Another very useful metric in evaluating your assembly is to assess the number of fully reconstructed coding transcripts. This can be done by performing a BLASTX search of your assembled transcript sequences to a high quality database of protein sequences, such as provided by SWISSPROT. Searching a large protein database using BLASTX can take a while - longer than we want during this workshop, so instead, we'll search the mini-version of SWISSPROT that comes installed in our data/ directory

They downloaded sequences they find interesting from SWISSPROT, then ran makeblastdb. I think you can download the "mini-version" from their cloud or create your own database of interesting sequences

ADD REPLYlink modified 8 days ago • written 9 days ago by Bastien Hervé3.3k

Thank you so much Bastien. I have run the previos command:

TransDecoder.LongOrfs -t Trinity.fasta


TransDecoder.Predict -t Trinity.fas

So then I have files:


Can I use as a database of interesting sequences the file called "Trinity.fasta.transdecoder.pep"?

ADD REPLYlink written 8 days ago by luzglongoria20

You could but you probably want to use that as a query against SWISSPROT for a blastp search. That would be an easier search to parse through.

ADD REPLYlink modified 8 days ago • written 8 days ago by genomax62k

Ok, I see. But then, how can I create my own database of interesting sequenced?


ADD REPLYlink written 8 days ago by luzglongoria20

Normally you would search against SWISSPROT using proteins you predicted from Trinity analysis to see what their putative function is.

You just want to take a subset of proteins from SWISSPROT (interesting sequences)? Creating your own database would involve makeblastdb from BLAST+ package as already noted by @Bastien and your multi-fasta DB file. If you want to use entire SWISSPROT database then you can get premade indexes from NCBI's FTP site.

ADD REPLYlink written 8 days ago by genomax62k


What I want is to search against SWISSPROT using the Trinity.fasta file.

The problem is that I am working with a spp. from Plasmodium which genome is not anywhere. So, I am not sure whether I need to download a subset of proteins from SWISSPROT (by selecting all the ones from Plasmodium in the dataset) or create my own as you both said and using makeblastdb.

ADD REPLYlink written 8 days ago by luzglongoria20
Please log in to add an answer.


Use of this site constitutes acceptance of our User Agreement and Privacy Policy.
Powered by Biostar version 2.3.0
Traffic: 2010 users visited in the last hour