Question: How can I calculate the C:N ratio (or just number of carbons and nitrogens) of each amino acid sequence in a multifasta file?
0
gravatar for kieft1bp
9 months ago by
kieft1bp0
United States
kieft1bp0 wrote:

I have a multifasta file of amino acid sequences, around 1000 seqs total, like so:

  • > seq_id_1
  • MAWT........
  • > seq_id_2
  • MTRA.......
  • ....
  • > seq_id_1000
  • MIVE.......

I want to calculate the molar C:N ratio (number of total carbon atoms in each sequence divided by the number of total nitrogen atoms in each sequence) for all seq IDs and print a tsv file, like so:

  • seq_id_1 \t 1.5
  • seq_id_2 \t 0.9
  • ...
  • seq_id_1000 \t 1.1

This C:N ratio is derived from the number of carbon and nitrogen atoms in each amino acid residue (e.g., there are 5 Cs and 1 N in Methionine) and the number of each amino acid in the protein sequence. Is there a tool available that can do this, or do I have to write my own? I am fine with using a web server, a pre-written suite that runs on unix (mac, linux), or custom scripts from someone (python, perl, ruby). Thanks!

ADD COMMENTlink modified 9 months ago • written 9 months ago by kieft1bp0

using awk:

awk '/^>/ {if(S>0) {print N==0?"NA":C/N;} C=0;N=0;S++;printf("%s\t",$0); ;next;} {t=$0; gsub(/[^Cc]/,"",t);C+=length(t);t=$0;gsub(/[^Nn]/,"",t);N+=length(t);} END{print N==0?"NA":C/N;}' in.fasta

ADD REPLYlink modified 9 months ago • written 9 months ago by Pierre Lindenbaum124k

Thanks for the answer, Pierre, but the problem is a little more complicated than counting the instances of a string in each line. I've updated my question. My fasta sequences are just amino acids (with no information about carbon or nitrogen content), so what I actually need to do is reference a separate table that contains the number of carbon and nitrogen atoms per amino acid in order to calculate the C:N ratio for each sequence.

ADD REPLYlink written 9 months ago by kieft1bp0
3

There's 20 amino acids, it's fairly easy to create that list from the chemical formula in wikipedia, read it in a dictionary/hash, loop over your sequences, add up Cs and Ns, compute the ratio. Doesn't seem very complicated, or do I miss something

ADD REPLYlink written 9 months ago by Carambakaracho1.9k

Yes, you're right. I was just wondering if there was a tool already that was written to solve the same task. Just trying not to reinvent the wheel.

ADD REPLYlink written 9 months ago by kieft1bp0

I'm not saying it doesn't exist, but if it takes you longer to search for a tool than to write it then the choice is easy :-)

ADD REPLYlink written 9 months ago by WouterDeCoster42k
Please log in to add an answer.

Help
Access

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.
Powered by Biostar version 2.3.0
Traffic: 1730 users visited in the last hour