Calculate cysteine content
1
0
Entering edit mode
7.9 years ago

I would like to know a way to calculate cysteine content of more than 3% in a group of sequences. Can anybody help me?

sequence • 1.6k views
ADD COMMENT
0
Entering edit mode

I´d like how many sequences have cysteine content of more than 3% in a group of sequences.

ADD REPLY
2
Entering edit mode
7.9 years ago

using fasta input and awk. (get the number of bases N for each sequence and the number C of cysteins, print the name if C/N > 0.03)

curl -s "ftp://ftp.uniprot.org/pub/databases/uniprot/current_release/knowledgebase/complete/uniprot_sprot.fasta.gz"  |\
gunzip -c | \
awk '/^>/ {if(N>0.0 && C/N>0.03) printf("%f\t%s\n",C/N,name); name=$0;N=0.0;C=0.0;next;}  {N+=length($0); gsub(/[^Cc]/,"",$0); C+=length($0);} END {if(N>0.0 && C/N>0.03) printf("%f\t%s\n",C/N,name);}' |\
head -n1000 | sort -k1,1g

(...)
0.190476    >sp|P37046|3CP1_STRS9 Tricyclic peptide RP 71955 OS=Streptomyces sp. (strain SP9440) PE=1 SV=2
0.190476    >sp|P85078|3CP2_STRSQ Tricyclic peptide MS-271 OS=Streptomyces sp. PE=1 SV=1
0.212766    >sp|P84028|45C1_ANCSP Delta-ctenitoxin-Asp2e OS=Ancylometes sp. PE=1 SV=1
0.228070    >sp|Q6GZT3|043R_FRG3G Uncharacterized protein 043R OS=Frog virus 3 (isolate Goorha) GN=FV3-043R PE=4 SV=1
ADD COMMENT
0
Entering edit mode

I do not know very well the awk language. I tried using this command line below, but it returned all sequences for me, including those without cysteine.

 {if(N>0.0 && C/N>0.03) printf("%f\t%s\n",C/N,name); name=$0;N=0.0;C=0.0;next;}  {N+=length($0); gsub(/[^Cc]/,"",$0); C+=length($0);} END {if(N>0.0 && C/N>0.03) printf("%f\t%s\n",C/N,name);}' |\
head -n1000 | sort -k1,1g <my_sample.fasta > output.fasta
ADD REPLY
1
Entering edit mode

You only need this part

 awk '/^>/ {if(N>0.0 && C/N>0.03) printf("%f\t%s\n",C/N,name); name=$0;N=0.0;C=0.0;next;}  {N+=length($0); gsub(/[^Cc]/,"",$0); C+=length($0);} END {if(N>0.0 && C/N>0.03) printf("%f\t%s\n",C/N,name);}' <my_sample.fasta > output_file
ADD REPLY
0
Entering edit mode

Perfect!! I got it! Thank you very much.

ADD REPLY
1
Entering edit mode

Thanks go to @Pierre! Accept his answer (use the check mark found on his original answer under the upvote sign).

ADD REPLY
0
Entering edit mode

`/^>/' is missing (lines starting with '>')

ADD REPLY
0
Entering edit mode

and of, course, head -n1000 | sort -k1,1g is just here to speed-up my demo

ADD REPLY
0
Entering edit mode

Thanks Pierre! You helped me a lot.

ADD REPLY

Login before adding your answer.

Traffic: 2263 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6