DB downloaded error
1
0
Entering edit mode
6.0 years ago
worarado.kan ▴ 20

Hello everyone,

I try to use blastx compare Trinity.fasta file with swissprot db (sp) and Uniref90 using Linux base. SO, I have to download swissprot db (sp) and Uniref90 in fasta file from http://www.uniprot.org/downloads After download I got uniref90.fasta.gz.part and I can not extract this file. How can I extract this file?. If you have some guidance about functional annotation using Trinity fasta file please let me know.

Best regards, Kan

next-gen RNA-Seq Assembly • 1.3k views
ADD COMMENT
0
Entering edit mode
6.0 years ago
Tm ★ 1.1k

You are not able to extract it as it has not been downloaded completely. Even your file name suggests that it is a part of file (uniref90.fasta.gz.part) and not complete.

once you are able to download the complete file and extract it, you have to first format the database using makeblastdb:

makeblastdb -in uniref90.fasta -dbtype 'prot'

Then you can use blastx with mandatory arguments like:

blastx -query Trinity.fasta -db formatted_database_file (uniref90.fasta) -out out_file.xml -outfmt 5

Here outformat 5 stands for xml file, so if you are interested in text file or any other format, check the command 'blastx -help'.

Other optional arguments which you can use according to your aim is

-max_target_seqs, -num_threads, -evalue etc.
ADD COMMENT
0
Entering edit mode

Thank you so much for your information

ADD REPLY
0
Entering edit mode

Dear toralmanvar, Can I ask again about uniprot_sprot.fasta (sp). I downloaded this fasta file from same website above but when I run follow this command

QUERY=Trinity01052018_Echota-UP_fasta_iso_ID.fasta

DB=/run/media/hscience/DATA_CentOS/DATABASE/db/uniPROT/uniprot_sprot.fasta

FORMAT="6 qseqid sseqid evalue stitle" EVALUE=1.0e-5 QUERY_CODE=1 MAX_TARGET_SEQ=1 NCPU=4 Home_blastx=/usr/local/bin/blastx OUTF=basename $QUERY_basename $DB_blastx_fmt6.txt

blastx -query $QUERY \ -db $DB \ -evalue $EVALUE \ -query_gencode $QUERY_CODE \ -max_target_seqs $MAX_TARGET_SEQ \ -num_threads $NCPU \ -outfmt "$FORMAT" \ -out $OUTF

why is showed ; BLAST Database error: No alias or index file found for protein database [/run/media/hscience/DATA_CentOS/DATABASE/db/uniPROT/uniprot_sprot.fasta] in search path [/run/media/hscience/DATA_CentOS/Bluberry/RNAseqblueberry/4_RSEM_edgeR/edgeR.genes.dir/P1e-10_C8/PickIsoform_Echota_UP/Swissprot::]

Do you think it about uniprot_sprot.fasta or some think wrong?

Thank you Kan

ADD REPLY
0
Entering edit mode

Have you formatted your database as I instructed in my previous answer to your query? You are getting this error as the blast is not able to find formatted database. So please format the database using makeblastdb program:

makeblastdb -in uniref90.fasta -dbtype 'prot'

It will result in the generation of 3 files having extension uniref90.fasta.phr, uniref90.fasta.pin and uniref90.fasta.psq.

Once it is generated you can use this formatted database for blast. Remember you have to use database name which you get after formatting. In above example case, it will be uniref90.fasta (i.e name before .phr, .pin and .psq extension)

ADD REPLY

Login before adding your answer.

Traffic: 2044 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6