Question: Add a column of "description of proteins" against the accession number in tab delimited file.
gravatar for adeel.maliks20
3.3 years ago by
adeel.maliks2010 wrote:

I have a tab delimited file generated from blastp diamond.

I want to add the description of each protein against their accession number.

The file looks like this;

BalsamFir1001   gi|672108306|ref|XP_008784482.1|        53.3    1188    487     16      150     1299    32      1189    0.0e+00 1236.9
BalsamFir10022  gi|586769306|ref|XP_006856185.1|        43.9    471     227     12      73      522     23      477     2.1e-76 290.8
BalsamFir10042  gi|586694060|ref|XP_006843464.1|        84.6    468     58      1       16      483     9       462     4.4e-230        801.2

I fetch the accession number from above mentioned file and then i entered in batch entrez to get the description of these proteins. I got the output (.txt) which have irregular descriptions not sorted according to the input file. There are more than 16k description of proteins to add in this column.

File generated from Genbank looks like this;

1. translationally controlled tumor protein [Arabidopsis thaliana]
168 aa protein
NP_188286.1 GI:15228276

2. Pyridoxal phosphate (PLP)-dependent transferases superfamily protein [Arabidopsis thaliana]
194 aa protein
NP_188399.1 GI:15229510

What is the best way to solve this problem?

unix biopython python • 1.1k views
ADD COMMENTlink written 3.3 years ago by adeel.maliks2010

Shouldn't you be looking at protein database instead of genbank?

ADD REPLYlink modified 3.3 years ago • written 3.3 years ago by Santosh Anand5.1k

Examples posted are XP_* records which are part of nr protein database.

ADD REPLYlink modified 3.3 years ago • written 3.3 years ago by genomax85k
Please log in to add an answer.


Use of this site constitutes acceptance of our User Agreement and Privacy Policy.
Powered by Biostar version 2.3.0
Traffic: 713 users visited in the last hour