Add a column of "description of proteins" against the accession number in tab delimited file.
0
0
Entering edit mode
7.1 years ago

I have a tab delimited file generated from blastp diamond.

I want to add the description of each protein against their accession number.

The file looks like this;

BalsamFir1001   gi|672108306|ref|XP_008784482.1|        53.3    1188    487     16      150     1299    32      1189    0.0e+00 1236.9
BalsamFir10022  gi|586769306|ref|XP_006856185.1|        43.9    471     227     12      73      522     23      477     2.1e-76 290.8
BalsamFir10042  gi|586694060|ref|XP_006843464.1|        84.6    468     58      1       16      483     9       462     4.4e-230        801.2

I fetch the accession number from above mentioned file and then i entered in batch entrez to get the description of these proteins. I got the output (.txt) which have irregular descriptions not sorted according to the input file. There are more than 16k description of proteins to add in this column.

File generated from Genbank looks like this;

1. translationally controlled tumor protein [Arabidopsis thaliana]
168 aa protein
NP_188286.1 GI:15228276

2. Pyridoxal phosphate (PLP)-dependent transferases superfamily protein [Arabidopsis thaliana]
194 aa protein
NP_188399.1 GI:15229510

What is the best way to solve this problem?

python biopython unix • 1.7k views
ADD COMMENT
1
Entering edit mode

Shouldn't you be looking at protein database instead of genbank? https://www.ncbi.nlm.nih.gov/protein

ADD REPLY
0
Entering edit mode

Examples posted are XP_* records which are part of nr protein database.

ADD REPLY

Login before adding your answer.

Traffic: 2863 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6