How I Get The Fasta Sequences Of Proteins From A List Of Protien Pdb Id
3
0
Entering edit mode
8.5 years ago
heath ▴ 20

If i have a list of pdb id of protein along with the beginning and end of sequences i am interested in, Is there a API from pymol or other place I could get a file listing all the fasta sequence of these proteins (if possible in the region i am interested in?)

Thanks!

pdb fasta api • 5.5k views
ADD COMMENT
4
Entering edit mode
8.5 years ago

The following command seems to work:

$ echo -e "3I5F\n2p4k\n2p4m" | while read I; do curl -s "http://www.rcsb.org/pdb/rest/customReport?pdbids=${I}&customReportColumns=structureId,chainId,entityId,sequence,db_id,db_name&service=wsdisplay&format=text" | xsltproc stylesheet.xsl - ; done | fold -w 80

with stylesheet.xsl:

<script src="&lt;a href=" http:="" gist.github.com="" 4636539.js"="" rel="nofollow">http://gist.github.com/4636539.js"></script>

output:

>3I5F|A|1|O44934|O44934
MTMDFSDPDMEFLCLTRQKLMEATSIPFDGKKNCWVPDPDFGFVGAEIQSTKGDEVTVKTDKTQETRVVKKDDIGQRNPP
KFEMNMDMANLTFLNEASILHNLRSRYESGFIYTYSGLFCIAINPYRRLPIYTQGLVDKYRGKRRAEMPPHLFSIADNAY
QYMLQDRENQSMLITGESGAGKTENTKKVIQYFALVAASLAGKKDKKEEEKKKDEKKGTLEDQIVQCNPVLEAYGNAKTT
RNNNSSRFGKFIRIHFGTQGKIAGADIETYLLEKSRVTYQQSAERNYHIFYQLLSPAFPENIEKILAVPDPGLYGFINQG
TLTVDGIDDEEEMGLTDTAFDVLGFTDEEKLSMYKCTGCILHLGEMKWKQRGEQAEADGTAEAEKVAFLLGVNAGDLLKC
LLKPKIKVGTEYVTQGRNKDQVTNSIAALAKSLYDRMFNWLVRRVNQTLDTKAKRQFFIGVLDIAGFEIFDFNSFEQLCI
NYTNERLQQFFNHHMFVLEQEEYKKEGIVWEFIDFGLDLQACIELIEKPMGILSILEEECMFPKASDTSFKNKLYDNHLG
KNPMFGKPKPPKAGCAEAHFCLHHYAGSVSYSIAGWLDKNKDPINENVVELLQNSKEPIVKMLFTPPRILTPGGKKKKGK
SAAFQTISSVHKESLNKLMKNLYSTHPHFVRCIIPNELKTPGLIDAALVLHQLRCNGVLEGIRICRKGFPNRIIYSEFKQ
RYSILAPNAVPSGFADGKVVTDKALSALQLDPNEYRLGNTKVFFKAGVLGMLEDMRDERLSKIISMFQAHIRGYLMRKAY
KKLQDQRIGLTLIQRNVRKWLVLRNWEWWRLFNKVKPLL
>3I5F|B|2|P08052|P08052
AEEAPRRVKLSQRQMQELKEAFTMIDQDRDGFIGMEDLKDMFSSLGRVPPDDELNAMLKECPGQLNFTAFLTLFGEKVSG
TDPEDALRNAFSMFDEDGQGFIPEDYLKDLLENMGDNFSKEEIKNVWKDAPLKNKQFNYNKMVDIKGKAEDED
>3I5F|C|3|P05945|P05945
SQLTKDEIEEVREVFDLFDFWDGRDGDVDAAKVGDLLRCLGMNPTEAQVHQHGGTKKMGEKAYKLEEILPIYEEMSSKDT
GTAADEFMEAFKTFDREGQGLISSAEIRNVLKMLGERITEDQCNDIFTFCDIREDIDGNIKYEDLMKKVMAGPFPDKSD
>2P4K|A|1|P04179|P04179
KHSLPDLPYDYGALEPHINAQIMQLHHSKHHAANVNNLNVTEEKYQEALAKGDVTAQIALQPALKFNGGGHINHSIFWTN
LSPNGGGEPKGELLEAIKRDFGSFDKFKEKLTAASVGVQGSGWGWLGFNKERGHLQIAACPNQDPLQGTTGLIPLLGIDV
WEHAYYLQYKNVRPDYLKAIWNVINWENVTERYMACKK
>2P4K|B|1|P04179|P04179
KHSLPDLPYDYGALEPHINAQIMQLHHSKHHAANVNNLNVTEEKYQEALAKGDVTAQIALQPALKFNGGGHINHSIFWTN
LSPNGGGEPKGELLEAIKRDFGSFDKFKEKLTAASVGVQGSGWGWLGFNKERGHLQIAACPNQDPLQGTTGLIPLLGIDV
WEHAYYLQYKNVRPDYLKAIWNVINWENVTERYMACKK
>2P4K|C|1|P04179|P04179
KHSLPDLPYDYGALEPHINAQIMQLHHSKHHAANVNNLNVTEEKYQEALAKGDVTAQIALQPALKFNGGGHINHSIFWTN
LSPNGGGEPKGELLEAIKRDFGSFDKFKEKLTAASVGVQGSGWGWLGFNKERGHLQIAACPNQDPLQGTTGLIPLLGIDV
WEHAYYLQYKNVRPDYLKAIWNVINWENVTERYMACKK
>2P4K|D|1|P04179|P04179
KHSLPDLPYDYGALEPHINAQIMQLHHSKHHAANVNNLNVTEEKYQEALAKGDVTAQIALQPALKFNGGGHINHSIFWTN
LSPNGGGEPKGELLEAIKRDFGSFDKFKEKLTAASVGVQGSGWGWLGFNKERGHLQIAACPNQDPLQGTTGLIPLLGIDV
WEHAYYLQYKNVRPDYLKAIWNVINWENVTERYMACKK
>2P4M|A|1|P83690|P83690
MSVIATQMTYKVYMSGTVNGHYFEVEGDGKGKPYEGEQTVKLTVTKGGPLPFAWDILSPQCQYGSIPFTKYPEDIPDYVK
QSFPEGFTWERIMNFEDGAVCTVSNDSSIQGNCFTYHVKFSGLNFPPNGPVMQKKTQGWEPSSERLFARGGMLIGNNFMA
LKLEGGGHYLCEFKTTYKAKKPVKMPGYHYVDRKLDVTNHNKDYTSVEQCEISIARKPVVA
>2P4M|B|1|P83690|P83690
MSVIATQMTYKVYMSGTVNGHYFEVEGDGKGKPYEGEQTVKLTVTKGGPLPFAWDILSPQCQYGSIPFTKYPEDIPDYVK
QSFPEGFTWERIMNFEDGAVCTVSNDSSIQGNCFTYHVKFSGLNFPPNGPVMQKKTQGWEPSSERLFARGGMLIGNNFMA
LKLEGGGHYLCEFKTTYKAKKPVKMPGYHYVDRKLDVTNHNKDYTSVEQCEISIARKPVVA
>2P4M|C|1|P83690|P83690
MSVIATQMTYKVYMSGTVNGHYFEVEGDGKGKPYEGEQTVKLTVTKGGPLPFAWDILSPQCQYGSIPFTKYPEDIPDYVK
QSFPEGFTWERIMNFEDGAVCTVSNDSSIQGNCFTYHVKFSGLNFPPNGPVMQKKTQGWEPSSERLFARGGMLIGNNFMA
LKLEGGGHYLCEFKTTYKAKKPVKMPGYHYVDRKLDVTNHNKDYTSVEQCEISIARKPVVA
>2P4M|D|1|P83690|P83690
MSVIATQMTYKVYMSGTVNGHYFEVEGDGKGKPYEGEQTVKLTVTKGGPLPFAWDILSPQCQYGSIPFTKYPEDIPDYVK
QSFPEGFTWERIMNFEDGAVCTVSNDSSIQGNCFTYHVKFSGLNFPPNGPVMQKKTQGWEPSSERLFARGGMLIGNNFMA
LKLEGGGHYLCEFKTTYKAKKPVKMPGYHYVDRKLDVTNHNKDYTSVEQCEISIARKPVVA
>2P4M|E|1|P83690|P83690
MSVIATQMTYKVYMSGTVNGHYFEVEGDGKGKPYEGEQTVKLTVTKGGPLPFAWDILSPQCQYGSIPFTKYPEDIPDYVK
QSFPEGFTWERIMNFEDGAVCTVSNDSSIQGNCFTYHVKFSGLNFPPNGPVMQKKTQGWEPSSERLFARGGMLIGNNFMA
LKLEGGGHYLCEFKTTYKAKKPVKMPGYHYVDRKLDVTNHNKDYTSVEQCEISIARKPVVA
>2P4M|F|1|P83690|P83690
MSVIATQMTYKVYMSGTVNGHYFEVEGDGKGKPYEGEQTVKLTVTKGGPLPFAWDILSPQCQYGSIPFTKYPEDIPDYVK
QSFPEGFTWERIMNFEDGAVCTVSNDSSIQGNCFTYHVKFSGLNFPPNGPVMQKKTQGWEPSSERLFARGGMLIGNNFMA
LKLEGGGHYLCEFKTTYKAKKPVKMPGYHYVDRKLDVTNHNKDYTSVEQCEISIARKPVVA
>2P4M|G|1|P83690|P83690
MSVIATQMTYKVYMSGTVNGHYFEVEGDGKGKPYEGEQTVKLTVTKGGPLPFAWDILSPQCQYGSIPFTKYPEDIPDYVK
QSFPEGFTWERIMNFEDGAVCTVSNDSSIQGNCFTYHVKFSGLNFPPNGPVMQKKTQGWEPSSERLFARGGMLIGNNFMA
LKLEGGGHYLCEFKTTYKAKKPVKMPGYHYVDRKLDVTNHNKDYTSVEQCEISIARKPVVA
>2P4M|H|1|P83690|P83690
MSVIATQMTYKVYMSGTVNGHYFEVEGDGKGKPYEGEQTVKLTVTKGGPLPFAWDILSPQCQYGSIPFTKYPEDIPDYVK
QSFPEGFTWERIMNFEDGAVCTVSNDSSIQGNCFTYHVKFSGLNFPPNGPVMQKKTQGWEPSSERLFARGGMLIGNNFMA
LKLEGGGHYLCEFKTTYKAKKPVKMPGYHYVDRKLDVTNHNKDYTSVEQCEISIARKPVVA
ADD COMMENT
1
Entering edit mode
8.5 years ago

Try pdbtools http://code.google.com/p/pdb-tools/ there is a module included pdbseq.py

ADD COMMENT
0
Entering edit mode
8.5 years ago
heath ▴ 20

Thanks a lot! ^-^

ADD COMMENT

Login before adding your answer.

Traffic: 1137 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6