Downloading fasta sequence for a PDB entry
2
0
Entering edit mode
4.6 years ago

I would like to know if it is possible to download the sequence FASTA of a pdb file using biopython

genome biopython • 2.9k views
ADD COMMENT
0
Entering edit mode

[ Please read before posting a question ] -- How To Ask A Good Question - what have you tried so far?

You can use NCBI unix eutils

esearch -db protein -query '1REV[All Fields] AND pdb[filter]'|efetch -format fasta
ADD REPLY
0
Entering edit mode
ADD REPLY
2
Entering edit mode
4.6 years ago
Sej Modha 5.3k

ADD COMMENT
0
Entering edit mode
4.6 years ago
Joe 21k

Kind of a hacky solution (since it downloads the PDB first technically) but here's something you can use as a one-liner:

$ wget -O - https://files.rcsb.org/download/1A80.pdb 2>/dev/null \
   | python -c "import sys; from Bio import SeqIO; SeqIO.convert(sys.stdin, 'pdb-atom', sys.stdout, 'fasta')"
>1A80:A
TVPSIVLNDGNSIPQLGYGVFKVPPADTQRAVEEALEVGYRHIDTAAIYGNEEGVGAAIA
ASGIARDDLFITTKLWNDRHDGDEPAAAIAESLAKLALDQVDLYLVHWPTPAADNYVHAW
EKMIELRAAGLTRSIGVSNHLVPHLERIVAATGVVPAVNQIELHPAYQQREITDWAAAHD
VKIESWGPLGQGKYDLFGAEPVTAAAAAHGKTPAQAVLRWHLQKGFVVFPKSVRRERLEE
NLDVFDFDLTDTEIAAIDAMDPGDGSGRVSAHPDEVD

Just replace 1A80 in the wget link to whatever the PDB ID you're interested in is. BioPython doesn't have the ability to download the data inherently, so you need to pass it the file somehow. I've elected to do this in the shell, but you could also do this natively with python, but its more complicated (IMO).

If you want to save it as a file, stick a redirect to a file at the end of the command:

(previous command)... > pdbsequence.fa
ADD COMMENT

Login before adding your answer.

Traffic: 1771 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6