retrieve organism from protein list
1
0
Entering edit mode
7.3 years ago
Assa Yeroslaviz ★ 1.8k

Hi,

I have a list of proteins accumulated from a BLAT search against nr. I would like to know how I can retrieve the organism each of the proteins belongs to.

thanks in advance

Assa

proteom RNA-Seq blast • 1.6k views
ADD COMMENT
0
Entering edit mode
7.3 years ago

Target proteins from nr should have accesion id or gi, so

  1. Mapping accession id or gi to taxid using prot.accession2taxid.gz. csvtk is used for grepping given columns.

    $ zcat prot.accession2taxid.gz
    accession       accession.version       taxid   gi
    P29373  P29373.2        9606    132401
    P22935  P22935.2        10090   132402
    P18902  P18902.1        9913    132403
    
    # using accession
    $ zcat prot.accession2taxid.gz | csvtk -t grep -f accession -P acc.txt | cut -f 3 | sed 1d > taxid.txt
    
    # using gi
    $ zcat prot.accession2taxid.gz | csvtk -t grep -f gi -P gi.txt | cut -f 3 | sed 1d > taxid.txt
    
  2. Retrieving organism information with the taxid using tools like ete3 (How to get phylum, class etc taxonomic ids from taxid? ), or taxonkit

ADD COMMENT

Login before adding your answer.

Traffic: 1437 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6