Question: Add a column of "description of proteins" against the accession number in tab delimited file.
14 months ago by
adeel.maliks2010 wrote:

I have a tab delimited file generated from blastp diamond.

I want to add the description of each protein against their accession number.

The file looks like this;

BalsamFir1001   gi|672108306|ref|XP_008784482.1|        53.3    1188    487     16      150     1299    32      1189    0.0e+00 1236.9
BalsamFir10022  gi|586769306|ref|XP_006856185.1|        43.9    471     227     12      73      522     23      477     2.1e-76 290.8
BalsamFir10042  gi|586694060|ref|XP_006843464.1|        84.6    468     58      1       16      483     9       462     4.4e-230        801.2

I fetch the accession number from above mentioned file and then i entered in batch entrez to get the description of these proteins. I got the output (.txt) which have irregular descriptions not sorted according to the input file. There are more than 16k description of proteins to add in this column.

File generated from Genbank looks like this;

1. translationally controlled tumor protein [Arabidopsis thaliana]
168 aa protein
NP_188286.1 GI:15228276

2. Pyridoxal phosphate (PLP)-dependent transferases superfamily protein [Arabidopsis thaliana]
194 aa protein
NP_188399.1 GI:15229510

What is the best way to solve this problem?

unix biopython python • 504 views
ADD COMMENTlink written 14 months ago by adeel.maliks2010

Shouldn't you be looking at protein database instead of genbank?

ADD REPLYlink modified 14 months ago • written 14 months ago by Santosh Anand3.5k

Examples posted are XP_* records which are part of nr protein database.

ADD REPLYlink modified 14 months ago • written 14 months ago by genomax48k
