Question: Calculating Shannon entropy for RNA sequences in multi-fasta using rnaentropy
gravatar for adhirajnath14
7 months ago by
adhirajnath1430 wrote:

I have to calculate the Shannon entropy for a given list of sequences in a fasta file. Recently I came across a program called rnaentropy which solves my problem but the issue is I cannot run sequences in batch. In the documentation the input is given as a sequence of strings. The parameters in the documentation is as follows:

./RNAentropy "sequence" -s sequence -t temperature -e energyModel(default 2004) -d delta_Temp (compute structural entropy using  <E> = RT² * d/dT ln(Z(T)) ) -c (centered) -z energy_is_zero [0|1] (default 0)  -v (verbose, extended output)
       ./RNAentropy -h (detailed help)  

Input paramters are:
   <sequence>          :  If FIRST argument is a valid nucleotide sequence it will be used input
   -s  <sequence>      : Alternative flag for input sequence
   -t  <temperature>   : Temperature in ºC (default is 17ºC)
   -e  <energy_model>  : Thermodynamic energy model used. Valid values for energy model are 1999, 2004 and 2007 (resp. Turner'99, Turner'04 and Andronescu '07) (default is 2004)
   -d  <delta_T>       : Temperature variation used for estimating d/dT ln(Z(T))
                        NOTE: If this parameter is provided, H is computed estimating  <E> = RT² * d/dT ln(Z(T))
   -c                 : (Use only in combination of -d) Use the centered version for estimating  <E> = RT² * d/dT ln(Z(T)) 
   -v                 : Output includes method for computing H and the name of the ouput parameters
   -z  <energy_is_zero>: [0|1] If value is 1, energies are set to 0, ouput is structural entropy for the uniform case 

Can anyone please suggest me a way to do it for a multi-fasta? Is there a way to iterate in a shell script?

rna sequence rnaentropy • 333 views
ADD COMMENTlink modified 6 months ago by ole.tange3.7k • written 7 months ago by adhirajnath1430

If it accepts only single entry fasta sequences, it would be easiest to split the fasta first, then run a bash loop or GNU parallel over all the split files.

Its not clear from that help whether the tool accepts STDIN as input, I suspect not. If it does, you can split the fasta on the fly potentially without having to create intermediate files.

ADD REPLYlink written 7 months ago by Joe16k

The program is not taking any file as input. I ran a pseudo code and got this result:

2.562930        0.699157        18      -0.880430

Sorry I couldn't follow how to split fasta as you suggested, which would make the input a series of string for each sequence. Sir, can you please provide any link for better understanding the problem. Thank you.

ADD REPLYlink written 7 months ago by adhirajnath1430

Does your fasta file have multiline sequences?

It is going to be quite complicated to split, parse and extract sequences from your fasta and then run them through the tool all in one go.

I would suggest splitting as I mentioned (please search the forum, there are no end of ways to do this).

You can then loop over all the files something like so:

for file in *.fasta; do
  ./rnaentropy $(grep -v '>' $file | tr -d '\n')
ADD REPLYlink modified 7 months ago • written 7 months ago by Joe16k
gravatar for ole.tange
6 months ago by
ole.tange3.7k wrote:

With GNU Parallel (Gnu Parallel - Parallelize Serial Command Line Programs Without Changing Them) you would do:

cat in.fasta |
  parallel --pipe -N1 --recstart '>' --rrs 'read a; echo Doing "$a"; ./RNAentropy -s $(tr -d "\n")'

From GNU Parallel Tutorial: Informational:

--recend str Record end separator for --pipe.
--recstart str Record start separator for --pipe.

ADD COMMENTlink modified 6 months ago by zx87549.2k • written 6 months ago by ole.tange3.7k

I always forget about recstart!

ADD REPLYlink written 6 months ago by Joe16k
gravatar for JC
7 months ago by
JC10k wrote:

I wouuld use a Perl (or Python script) to parse the multifasta and run your tool, like:


use strict;
use warnings;

$/="\n>"; # read fasta sequences in blocks
while (<>) {
     my ($id, @seq) = split (/\n/, $_);
     my $seq = join "", @seq; # rebuild the sequence, one single line
     print "$id\n"; # to indicate which sequence is being analyzed
     system("./RNAentropy -s $seq"); # run the command, add params if needed

then you can run as perl < multifasta_in > output_file

ADD COMMENTlink modified 7 months ago • written 7 months ago by JC10k

Thank you so much, Sir. You are a real life saver.

ADD REPLYlink written 7 months ago by adhirajnath1430

glad to help, good luck with your research

ADD REPLYlink written 7 months ago by JC10k
Please log in to add an answer.


Use of this site constitutes acceptance of our User Agreement and Privacy Policy.
Powered by Biostar version 2.3.0
Traffic: 1959 users visited in the last hour