Question: Basic question about SWISSPROT
1
gravatar for luzglongoria
9 days ago by
luzglongoria20
luzglongoria20 wrote:

Hi there,

I have been working on my transcriptome assembly and now it is time to start doing functional annotation.

I have read that a way to do it is using SWISSPROT. The command that I have found is something like:

blastx -db ~/shared_ro/dbs/sprot.mini.pep
-query Trinity.fasta -num_threads 2 \
 -max_target_seqs 1 -outfmt 6 -evalue 1e-5 \
> swissprot.blastx.outfmt6

However, and probably due to I am not an expert in this field, I don't know where to find the sprot.mini.pep file. I guess it is the database that SWISSPROT uses (or maybe not). But I don't know if I need to download it or it already installed when I installed Trinity since I did my transcript whit Trinity and I am following the commands in here: https://github.com/trinityrnaseq/BerlinTrinityWorkshop2018/wiki/functional_annotation

Thank you so much in advance

ADD COMMENTlink written 9 days ago by luzglongoria20

Another very useful metric in evaluating your assembly is to assess the number of fully reconstructed coding transcripts. This can be done by performing a BLASTX search of your assembled transcript sequences to a high quality database of protein sequences, such as provided by SWISSPROT. Searching a large protein database using BLASTX can take a while - longer than we want during this workshop, so instead, we'll search the mini-version of SWISSPROT that comes installed in our data/ directory

They downloaded sequences they find interesting from SWISSPROT, then ran makeblastdb. I think you can download the "mini-version" from their cloud or create your own database of interesting sequences

ADD REPLYlink modified 8 days ago • written 9 days ago by Bastien Hervé3.3k

Thank you so much Bastien. I have run the previos command:

TransDecoder.LongOrfs -t Trinity.fasta

and

TransDecoder.Predict -t Trinity.fas

So then I have files:

Trinity.fasta.transdecoder.bed
Trinity.fasta.transdecoder.cds
Trinity.fasta.transdecoder.pep

Can I use as a database of interesting sequences the file called "Trinity.fasta.transdecoder.pep"?

ADD REPLYlink written 8 days ago by luzglongoria20

You could but you probably want to use that as a query against SWISSPROT for a blastp search. That would be an easier search to parse through.

ADD REPLYlink modified 8 days ago • written 8 days ago by genomax62k

Ok, I see. But then, how can I create my own database of interesting sequenced?

Thanks

ADD REPLYlink written 8 days ago by luzglongoria20

Normally you would search against SWISSPROT using proteins you predicted from Trinity analysis to see what their putative function is.

You just want to take a subset of proteins from SWISSPROT (interesting sequences)? Creating your own database would involve makeblastdb from BLAST+ package as already noted by @Bastien and your multi-fasta DB file. If you want to use entire SWISSPROT database then you can get premade indexes from NCBI's FTP site.

ADD REPLYlink written 8 days ago by genomax62k

Thanks.

What I want is to search against SWISSPROT using the Trinity.fasta file.

The problem is that I am working with a spp. from Plasmodium which genome is not anywhere. So, I am not sure whether I need to download a subset of proteins from SWISSPROT (by selecting all the ones from Plasmodium in the dataset) or create my own as you both said and using makeblastdb.

ADD REPLYlink written 8 days ago by luzglongoria20
Please log in to add an answer.

Help
Access

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.
Powered by Biostar version 2.3.0
Traffic: 2010 users visited in the last hour