Question: How to find a protein with a specific number of aminoacids
0
gravatar for nazaninhoseinkhan
4.3 years ago by
Iran, Islamic Republic Of
nazaninhoseinkhan360 wrote:

Hi all,

Is there any way to find a protein sequence with exactly a certain number of amino acids, for example only two Cysteins?

I will appreciate any advice

sequence • 1.1k views
ADD COMMENTlink modified 4.3 years ago by Pierre Lindenbaum120k • written 4.3 years ago by nazaninhoseinkhan360
0
gravatar for Pierre Lindenbaum
4.3 years ago by
France/Nantes/Institut du Thorax - INSERM UMR1087
Pierre Lindenbaum120k wrote:

two cysteins, here we go:

 

curl -s "ftp://ftp.uniprot.org/pub/databases/uniprot/current_release/knowledgebase/complete/uniprot_sprot.fasta.gz" |\
gunzip -c |\
awk '/^>/ {printf("\n%s \t",$0); next;} {printf("%s",$0);}' |\
awk -F '\t' '{S=$2;gsub(/[^Cc]/,"",S);if(length(S)==2) printf("%s\n%s\n",$1,$2);}'

 

>sp|Q91G67|029R_IIV6 Uncharacterized protein 029R OS=Invertebrate iridescent virus 6 GN=IIV6-029R PE=4 SV=1
MVERLGIAVEDRSPKLRKQAIRERFVLFKKNTERVEKYEYYAIRGQSIYINGRLSKLQSERYPKMIILLDIFCQPNPRNLFLRFKERIDGKSEWENNFTYAGNNIGCTKEMESDMIRIFNELDDEKRDV
>sp|Q6GZU4|032R_FRG3G Uncharacterized protein 032R OS=Frog virus 3 (isolate Goorha) GN=FV3-032R PE=4 SV=1
MVTVTELRATAKNLGIRGYSTMRKAELEEAIRDHGRVSEARVASPRRSPARSPRKSPAGRKSPSKSPAGRKSPSKSPAGRKSPSKSPAGRKSPSKSPAGRKSPSKSPAGRKSPSKSPVRKSPSKSPVRKSPRKSPAAKLQAGDRPASMNICKNLPKQRLVDIATEMGIDLNRESDGKPKTKDQLCADIMGGAGRKSPRKSPSRSPVRKSPSRSPVRKSPVRSPRKSPVRVPSPVRSPVKEKTPVRSPARSEDAGSDLAPRPRRGKAVRLDYDEDDDYSYGASTDNLFSGNKEIPFPTRKRRTRKPEKVFVDVRSPHTLTDSEDEDDMVEVPELEDKEITMPGVLSPYSDEIVERGYVSQGGADYINYIYRTEYALESDESFARGARPKTNKRDSDRAVREAAAAAAIARALDRRSQSGNDEPAVRRRSAPTDSSRESRRDREPQRDIAEPQRDIAEPQRDIAEPQRDIAEPRKVRFREAGSADVRVFERDEPKEYGRVPVRPPLFMPAGEPLQPLKFRPKTPKIDDTIHRAQMVLPSKPSQKETDNYYKQFAGEAVRPSEPVQWDKDDQVLYHKVPAWDDSSYAAAVSAWPMSVDPKQAESVFAEFEQLSAQDSDLIKVRKSIMKALGY
>sp|Q197C3|037L_IIV3 Uncharacterized protein 037L OS=Invertebrate iridescent virus 3 GN=IIV3-037L PE=4 SV=1
MNAATSGIQLNAQTLSQQPAMNTPLIHRSFRDDYTGLVSAGDGLYKRKLKVPSTTRCNKFKWCSIGWSIGALIIFLVYKLEKPHVQPTSNGNLSLIEPEKLVSESQLIQKILNATTPQTTTPEIPSSTEPQELVTEILNTTTPQTTTPEIPSSTEPQELVTEIPSSTEPQEEIFSIFKSPKPEEPGGINSIPQYEQESNNVEDEPPPNKPEEEEDHDNQPLEERHTVPILGDVIIRNKTIIIDGGNETIIIKP
>sp|Q6GZT9|037R_FRG3G uncharacterized protein 037R OS=Frog virus 3 (isolate Goorha) GN=FV3-037R PE=3 SV=1
MQVFLDLDETLIHSIPVSRLGWTKSKPYPVKPFTVQDAGTPLSVMMGSSKAVNDGRKRLATRLSLFKRTVLTDHIMCWRPTLRTFLNGLFASGYKINVWTAASKPYALEVVKALNLKSYGMGLLVTAQDYPKGSVKRLKYLTGLDAVKIPLSNTAIVDDREEVKRAQPTRAVHIKPFTASSANTACSESDELKRVTASLAIIAGRSRRR
>sp|Q91G56|042R_IIV6 Uncharacterized protein 042R OS=Invertebrate iridescent virus 6 GN=IIV6-042R PE=4 SV=1
MATLQQAQQQNNQLTQQNNQLTQQNNQLTQRVNELTRFLEDANRKIQIKENVIKSSEAENRKNLAEINRLHSENHRLIQQSTRTICQKCSMRSN

 

 

ADD COMMENTlink modified 4.3 years ago • written 4.3 years ago by Pierre Lindenbaum120k

Thank you so much for your guide

ADD REPLYlink written 4.3 years ago by nazaninhoseinkhan360
Please log in to add an answer.

Help
Access

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.
Powered by Biostar version 2.3.0
Traffic: 1428 users visited in the last hour