Question: Modifying Standalone BLAST output
0
gravatar for glady
2.0 years ago by
glady240
glady240 wrote:

I am performing a standalone BLAST in ubuntu, I have downloaded the environmental metagenome (env_nt) database from NCBI. The cmd which I am using to perform BLAST is ->

blastn -db env_nt -query file.fasta -out BLAST_output.fasta -max_target_seqs 1 -outfmt '6 qseqid qseq sallseqid stitle score bitscore qcovs evalue pident sacc staxids sscinames scomnames sblastnames'

But in this output I also need the organisms name or the source name of the subject hit which I am obtaining form the BLAST. Can anyone help me regarding this? What syntax should I use to obtain the organism/source name? Thanking you.

blast • 762 views
ADD COMMENTlink modified 2.0 years ago by 5heikki8.4k • written 2.0 years ago by glady240

can you provide several lines of the result?

ADD REPLYlink modified 2.0 years ago • written 2.0 years ago by shenwei3564.6k
0
gravatar for shenwei356
2.0 years ago by
shenwei3564.6k
China
shenwei3564.6k wrote:

I tried before but failed. These's an indirect way.

ftp://ftp.ncbi.nih.gov/pub/taxonomy/accession2taxid/prot.accession2taxid.gz provides mapping relationship between accessions (sseqid) and taxid, which you can use to get the organisms name.

dummy data

$ cat t.tsv 
P05876  other-info
P27125  other-columns

get accession

$ cut -f 1 t.tsv > t.acc

get taxid

$ csvtk grep -t -f 1 -P t.acc prot.accession2taxid.gz | cut -f 1,3 | sed 1d > t.acc2taxid

get lineage

$ cat t.acc2taxid |  taxonkit lineage -i 2 > t.acc2taxid.lineage

merge taxid and lineage back to the blast result

$ csvtk join -H -t t.tsv t.acc2taxid.lineage
P05876  other-info      11731   Viruses;Retro-transcribing viruses;Retroviridae;Orthoretrovirinae;Lentivirus;Primate lentivirus group;Simian immunodeficiency virus;Simian immunodeficiency virus - agm;Simian immunodeficiency virus - agm.ver;Simian immunodeficiency virus (TYO-1 ISOLATE)
P27125  other-columns   83333   cellular organisms;Bacteria;Proteobacteria;Gammaproteobacteria;Enterobacterales;Enterobacteriaceae;Escherichia;Escherichia coli;Escherichia coli K-12

you may need csvtk and taxonkit.

ADD COMMENTlink modified 2.0 years ago • written 2.0 years ago by shenwei3564.6k
0
gravatar for 5heikki
2.0 years ago by
5heikki8.4k
Finland
5heikki8.4k wrote:

You have to setup taxdb but with env_nt I think all the sequences are basically annotated as "Environmental sample" so not much will be gained..

ADD COMMENTlink written 2.0 years ago by 5heikki8.4k
Please log in to add an answer.

Help
Access

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.
Powered by Biostar version 2.3.0
Traffic: 903 users visited in the last hour