Question

Downloading fasta sequence for a PDB entry

0

Entering edit mode

4.6 years ago

henriquezvera.95 • 0

I would like to know if it is possible to download the sequence FASTA of a pdb file using biopython

genome biopython • 2.9k views

ADD COMMENT • link updated 4.6 years ago by Sej Modha 5.3k • written 4.6 years ago by henriquezvera.95 • 0

0

Entering edit mode

[ Please read before posting a question ] -- How To Ask A Good Question - what have you tried so far?

You can use NCBI unix eutils

esearch -db protein -query '1REV[All Fields] AND pdb[filter]'|efetch -format fasta

ADD REPLY • link 4.6 years ago by Sej Modha 5.3k

0

Entering edit mode

There was a post some time ago:

How download a sequence fasta from PDB using biopython / python?

ADD REPLY • link 4.6 years ago by natasha.sernova ★ 4.0k

score 2 · Answer 1 · 2019-09-09

2

Entering edit mode

4.6 years ago

Sej Modha 5.3k

ADD COMMENT • link 4.6 years ago by Sej Modha 5.3k

score 0 · Answer 2 · 2019-09-08

Kind of a hacky solution (since it downloads the PDB first technically) but here's something you can use as a one-liner:

$ wget -O - https://files.rcsb.org/download/1A80.pdb 2>/dev/null \
   | python -c "import sys; from Bio import SeqIO; SeqIO.convert(sys.stdin, 'pdb-atom', sys.stdout, 'fasta')"
>1A80:A
TVPSIVLNDGNSIPQLGYGVFKVPPADTQRAVEEALEVGYRHIDTAAIYGNEEGVGAAIA
ASGIARDDLFITTKLWNDRHDGDEPAAAIAESLAKLALDQVDLYLVHWPTPAADNYVHAW
EKMIELRAAGLTRSIGVSNHLVPHLERIVAATGVVPAVNQIELHPAYQQREITDWAAAHD
VKIESWGPLGQGKYDLFGAEPVTAAAAAHGKTPAQAVLRWHLQKGFVVFPKSVRRERLEE
NLDVFDFDLTDTEIAAIDAMDPGDGSGRVSAHPDEVD

Just replace 1A80 in the wget link to whatever the PDB ID you're interested in is. BioPython doesn't have the ability to download the data inherently, so you need to pass it the file somehow. I've elected to do this in the shell, but you could also do this natively with python, but its more complicated (IMO).

If you want to save it as a file, stick a redirect to a file at the end of the command:

(previous command)... > pdbsequence.fa