How to find a protein with a specific number of aminoacids
1
0
Entering edit mode
9.2 years ago

Hi all,

Is there any way to find a protein sequence with exactly a certain number of amino acids, for example only two Cysteins?

I will appreciate any advice

sequence • 1.7k views
ADD COMMENT
0
Entering edit mode
9.2 years ago

Two cysteins, here we go:

curl -s "ftp://ftp.uniprot.org/pub/databases/uniprot/current_release/knowledgebase/complete/uniprot_sprot.fasta.gz" |\
gunzip -c |\
awk '/^>/ {printf("\n%s \t",$0); next;} {printf("%s",$0);}' |\
awk -F '\t' '{S=$2;gsub(/[^Cc]/,"",S);if(length(S)==2) printf("%s\n%s\n",$1,$2);}'
>sp|Q91G67|029R_IIV6 Uncharacterized protein 029R OS=Invertebrate iridescent virus 6 GN=IIV6-029R PE=4 SV=1
MVERLGIAVEDRSPKLRKQAIRERFVLFKKNTERVEKYEYYAIRGQSIYINGRLSKLQSERYPKMIILLDIFCQPNPRNLFLRFKERIDGKSEWENNFTYAGNNIGCTKEMESDMIRIFNELDDEKRDV
>sp|Q6GZU4|032R_FRG3G Uncharacterized protein 032R OS=Frog virus 3 (isolate Goorha) GN=FV3-032R PE=4 SV=1
MVTVTELRATAKNLGIRGYSTMRKAELEEAIRDHGRVSEARVASPRRSPARSPRKSPAGRKSPSKSPAGRKSPSKSPAGRKSPSKSPAGRKSPSKSPAGRKSPSKSPAGRKSPSKSPVRKSPSKSPVRKSPRKSPAAKLQAGDRPASMNICKNLPKQRLVDIATEMGIDLNRESDGKPKTKDQLCADIMGGAGRKSPRKSPSRSPVRKSPSRSPVRKSPVRSPRKSPVRVPSPVRSPVKEKTPVRSPARSEDAGSDLAPRPRRGKAVRLDYDEDDDYSYGASTDNLFSGNKEIPFPTRKRRTRKPEKVFVDVRSPHTLTDSEDEDDMVEVPELEDKEITMPGVLSPYSDEIVERGYVSQGGADYINYIYRTEYALESDESFARGARPKTNKRDSDRAVREAAAAAAIARALDRRSQSGNDEPAVRRRSAPTDSSRESRRDREPQRDIAEPQRDIAEPQRDIAEPQRDIAEPRKVRFREAGSADVRVFERDEPKEYGRVPVRPPLFMPAGEPLQPLKFRPKTPKIDDTIHRAQMVLPSKPSQKETDNYYKQFAGEAVRPSEPVQWDKDDQVLYHKVPAWDDSSYAAAVSAWPMSVDPKQAESVFAEFEQLSAQDSDLIKVRKSIMKALGY
>sp|Q197C3|037L_IIV3 Uncharacterized protein 037L OS=Invertebrate iridescent virus 3 GN=IIV3-037L PE=4 SV=1
MNAATSGIQLNAQTLSQQPAMNTPLIHRSFRDDYTGLVSAGDGLYKRKLKVPSTTRCNKFKWCSIGWSIGALIIFLVYKLEKPHVQPTSNGNLSLIEPEKLVSESQLIQKILNATTPQTTTPEIPSSTEPQELVTEILNTTTPQTTTPEIPSSTEPQELVTEIPSSTEPQEEIFSIFKSPKPEEPGGINSIPQYEQESNNVEDEPPPNKPEEEEDHDNQPLEERHTVPILGDVIIRNKTIIIDGGNETIIIKP
>sp|Q6GZT9|037R_FRG3G uncharacterized protein 037R OS=Frog virus 3 (isolate Goorha) GN=FV3-037R PE=3 SV=1
MQVFLDLDETLIHSIPVSRLGWTKSKPYPVKPFTVQDAGTPLSVMMGSSKAVNDGRKRLATRLSLFKRTVLTDHIMCWRPTLRTFLNGLFASGYKINVWTAASKPYALEVVKALNLKSYGMGLLVTAQDYPKGSVKRLKYLTGLDAVKIPLSNTAIVDDREEVKRAQPTRAVHIKPFTASSANTACSESDELKRVTASLAIIAGRSRRR
>sp|Q91G56|042R_IIV6 Uncharacterized protein 042R OS=Invertebrate iridescent virus 6 GN=IIV6-042R PE=4 SV=1
MATLQQAQQQNNQLTQQNNQLTQQNNQLTQRVNELTRFLEDANRKIQIKENVIKSSEAENRKNLAEINRLHSENHRLIQQSTRTICQKCSMRSN
ADD COMMENT
0
Entering edit mode

Thank you so much for your guide

ADD REPLY

Login before adding your answer.

Traffic: 3260 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6