How I Get The Fasta Sequences Of Proteins From A List Of Protien Pdb Id
2
0
Entering edit mode
11.2 years ago
heath ▴ 20

If I have a list of pdb id of protein along with the beginning and end of sequences I am interested in, is there a API from pymol or other place I could get a file listing all the fasta sequence of these proteins (if possible in the region i am interested in?)

Thanks!

pdb api fasta • 6.2k views
ADD COMMENT
0
Entering edit mode

Thanks a lot! ^-^

ADD REPLY
4
Entering edit mode
11.2 years ago

The following command seems to work:

$ echo -e "3I5F\n2p4k\n2p4m" | \
  while read I; do curl -s "http://www.rcsb.org/pdb/rest/customReport?pdbids=${I}&customReportColumns=structureId,chainId,entityId,sequence,db_id,db_name&service=wsdisplay&format=text" | \
  xsltproc stylesheet.xsl - ; done | \
  fold -w 80

with stylesheet.xsl:

output:

>3I5F|A|1|O44934|O44934
MTMDFSDPDMEFLCLTRQKLMEATSIPFDGKKNCWVPDPDFGFVGAEIQSTKGDEVTVKTDKTQETRVVKKDDIGQRNPP
KFEMNMDMANLTFLNEASILHNLRSRYESGFIYTYSGLFCIAINPYRRLPIYTQGLVDKYRGKRRAEMPPHLFSIADNAY
QYMLQDRENQSMLITGESGAGKTENTKKVIQYFALVAASLAGKKDKKEEEKKKDEKKGTLEDQIVQCNPVLEAYGNAKTT
RNNNSSRFGKFIRIHFGTQGKIAGADIETYLLEKSRVTYQQSAERNYHIFYQLLSPAFPENIEKILAVPDPGLYGFINQG
TLTVDGIDDEEEMGLTDTAFDVLGFTDEEKLSMYKCTGCILHLGEMKWKQRGEQAEADGTAEAEKVAFLLGVNAGDLLKC
LLKPKIKVGTEYVTQGRNKDQVTNSIAALAKSLYDRMFNWLVRRVNQTLDTKAKRQFFIGVLDIAGFEIFDFNSFEQLCI
NYTNERLQQFFNHHMFVLEQEEYKKEGIVWEFIDFGLDLQACIELIEKPMGILSILEEECMFPKASDTSFKNKLYDNHLG
KNPMFGKPKPPKAGCAEAHFCLHHYAGSVSYSIAGWLDKNKDPINENVVELLQNSKEPIVKMLFTPPRILTPGGKKKKGK
SAAFQTISSVHKESLNKLMKNLYSTHPHFVRCIIPNELKTPGLIDAALVLHQLRCNGVLEGIRICRKGFPNRIIYSEFKQ
RYSILAPNAVPSGFADGKVVTDKALSALQLDPNEYRLGNTKVFFKAGVLGMLEDMRDERLSKIISMFQAHIRGYLMRKAY
KKLQDQRIGLTLIQRNVRKWLVLRNWEWWRLFNKVKPLL
>3I5F|B|2|P08052|P08052
AEEAPRRVKLSQRQMQELKEAFTMIDQDRDGFIGMEDLKDMFSSLGRVPPDDELNAMLKECPGQLNFTAFLTLFGEKVSG
TDPEDALRNAFSMFDEDGQGFIPEDYLKDLLENMGDNFSKEEIKNVWKDAPLKNKQFNYNKMVDIKGKAEDED
>3I5F|C|3|P05945|P05945
SQLTKDEIEEVREVFDLFDFWDGRDGDVDAAKVGDLLRCLGMNPTEAQVHQHGGTKKMGEKAYKLEEILPIYEEMSSKDT
GTAADEFMEAFKTFDREGQGLISSAEIRNVLKMLGERITEDQCNDIFTFCDIREDIDGNIKYEDLMKKVMAGPFPDKSD
>2P4K|A|1|P04179|P04179
KHSLPDLPYDYGALEPHINAQIMQLHHSKHHAANVNNLNVTEEKYQEALAKGDVTAQIALQPALKFNGGGHINHSIFWTN
LSPNGGGEPKGELLEAIKRDFGSFDKFKEKLTAASVGVQGSGWGWLGFNKERGHLQIAACPNQDPLQGTTGLIPLLGIDV
WEHAYYLQYKNVRPDYLKAIWNVINWENVTERYMACKK
>2P4K|B|1|P04179|P04179
KHSLPDLPYDYGALEPHINAQIMQLHHSKHHAANVNNLNVTEEKYQEALAKGDVTAQIALQPALKFNGGGHINHSIFWTN
LSPNGGGEPKGELLEAIKRDFGSFDKFKEKLTAASVGVQGSGWGWLGFNKERGHLQIAACPNQDPLQGTTGLIPLLGIDV
WEHAYYLQYKNVRPDYLKAIWNVINWENVTERYMACKK
>2P4K|C|1|P04179|P04179
KHSLPDLPYDYGALEPHINAQIMQLHHSKHHAANVNNLNVTEEKYQEALAKGDVTAQIALQPALKFNGGGHINHSIFWTN
LSPNGGGEPKGELLEAIKRDFGSFDKFKEKLTAASVGVQGSGWGWLGFNKERGHLQIAACPNQDPLQGTTGLIPLLGIDV
WEHAYYLQYKNVRPDYLKAIWNVINWENVTERYMACKK
>2P4K|D|1|P04179|P04179
KHSLPDLPYDYGALEPHINAQIMQLHHSKHHAANVNNLNVTEEKYQEALAKGDVTAQIALQPALKFNGGGHINHSIFWTN
LSPNGGGEPKGELLEAIKRDFGSFDKFKEKLTAASVGVQGSGWGWLGFNKERGHLQIAACPNQDPLQGTTGLIPLLGIDV
WEHAYYLQYKNVRPDYLKAIWNVINWENVTERYMACKK
>2P4M|A|1|P83690|P83690
MSVIATQMTYKVYMSGTVNGHYFEVEGDGKGKPYEGEQTVKLTVTKGGPLPFAWDILSPQCQYGSIPFTKYPEDIPDYVK
QSFPEGFTWERIMNFEDGAVCTVSNDSSIQGNCFTYHVKFSGLNFPPNGPVMQKKTQGWEPSSERLFARGGMLIGNNFMA
LKLEGGGHYLCEFKTTYKAKKPVKMPGYHYVDRKLDVTNHNKDYTSVEQCEISIARKPVVA
>2P4M|B|1|P83690|P83690
MSVIATQMTYKVYMSGTVNGHYFEVEGDGKGKPYEGEQTVKLTVTKGGPLPFAWDILSPQCQYGSIPFTKYPEDIPDYVK
QSFPEGFTWERIMNFEDGAVCTVSNDSSIQGNCFTYHVKFSGLNFPPNGPVMQKKTQGWEPSSERLFARGGMLIGNNFMA
LKLEGGGHYLCEFKTTYKAKKPVKMPGYHYVDRKLDVTNHNKDYTSVEQCEISIARKPVVA
>2P4M|C|1|P83690|P83690
MSVIATQMTYKVYMSGTVNGHYFEVEGDGKGKPYEGEQTVKLTVTKGGPLPFAWDILSPQCQYGSIPFTKYPEDIPDYVK
QSFPEGFTWERIMNFEDGAVCTVSNDSSIQGNCFTYHVKFSGLNFPPNGPVMQKKTQGWEPSSERLFARGGMLIGNNFMA
LKLEGGGHYLCEFKTTYKAKKPVKMPGYHYVDRKLDVTNHNKDYTSVEQCEISIARKPVVA
>2P4M|D|1|P83690|P83690
MSVIATQMTYKVYMSGTVNGHYFEVEGDGKGKPYEGEQTVKLTVTKGGPLPFAWDILSPQCQYGSIPFTKYPEDIPDYVK
QSFPEGFTWERIMNFEDGAVCTVSNDSSIQGNCFTYHVKFSGLNFPPNGPVMQKKTQGWEPSSERLFARGGMLIGNNFMA
LKLEGGGHYLCEFKTTYKAKKPVKMPGYHYVDRKLDVTNHNKDYTSVEQCEISIARKPVVA
>2P4M|E|1|P83690|P83690
MSVIATQMTYKVYMSGTVNGHYFEVEGDGKGKPYEGEQTVKLTVTKGGPLPFAWDILSPQCQYGSIPFTKYPEDIPDYVK
QSFPEGFTWERIMNFEDGAVCTVSNDSSIQGNCFTYHVKFSGLNFPPNGPVMQKKTQGWEPSSERLFARGGMLIGNNFMA
LKLEGGGHYLCEFKTTYKAKKPVKMPGYHYVDRKLDVTNHNKDYTSVEQCEISIARKPVVA
>2P4M|F|1|P83690|P83690
MSVIATQMTYKVYMSGTVNGHYFEVEGDGKGKPYEGEQTVKLTVTKGGPLPFAWDILSPQCQYGSIPFTKYPEDIPDYVK
QSFPEGFTWERIMNFEDGAVCTVSNDSSIQGNCFTYHVKFSGLNFPPNGPVMQKKTQGWEPSSERLFARGGMLIGNNFMA
LKLEGGGHYLCEFKTTYKAKKPVKMPGYHYVDRKLDVTNHNKDYTSVEQCEISIARKPVVA
>2P4M|G|1|P83690|P83690
MSVIATQMTYKVYMSGTVNGHYFEVEGDGKGKPYEGEQTVKLTVTKGGPLPFAWDILSPQCQYGSIPFTKYPEDIPDYVK
QSFPEGFTWERIMNFEDGAVCTVSNDSSIQGNCFTYHVKFSGLNFPPNGPVMQKKTQGWEPSSERLFARGGMLIGNNFMA
LKLEGGGHYLCEFKTTYKAKKPVKMPGYHYVDRKLDVTNHNKDYTSVEQCEISIARKPVVA
>2P4M|H|1|P83690|P83690
MSVIATQMTYKVYMSGTVNGHYFEVEGDGKGKPYEGEQTVKLTVTKGGPLPFAWDILSPQCQYGSIPFTKYPEDIPDYVK
QSFPEGFTWERIMNFEDGAVCTVSNDSSIQGNCFTYHVKFSGLNFPPNGPVMQKKTQGWEPSSERLFARGGMLIGNNFMA
LKLEGGGHYLCEFKTTYKAKKPVKMPGYHYVDRKLDVTNHNKDYTSVEQCEISIARKPVVA
ADD COMMENT
1
Entering edit mode
11.2 years ago

Try pdb-tools - there is a module included pdb_seq.py

ADD COMMENT

Login before adding your answer.

Traffic: 2899 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6