Question: Calculating Shannon entropy for RNA sequences in multi-fasta using rnaentropy
2
7 months ago by

I have to calculate the Shannon entropy for a given list of sequences in a fasta file. Recently I came across a program called rnaentropy which solves my problem but the issue is I cannot run sequences in batch. In the documentation the input is given as a sequence of strings. The parameters in the documentation is as follows:

```./RNAentropy "sequence" -s sequence -t temperature -e energyModel(default 2004) -d delta_Temp (compute structural entropy using  <E> = RT² * d/dT ln(Z(T)) ) -c (centered) -z energy_is_zero [0|1] (default 0)  -v (verbose, extended output)
./RNAentropy -h (detailed help)

Input paramters are:
<sequence>          :  If FIRST argument is a valid nucleotide sequence it will be used input
-s  <sequence>      : Alternative flag for input sequence
-t  <temperature>   : Temperature in ºC (default is 17ºC)
-e  <energy_model>  : Thermodynamic energy model used. Valid values for energy model are 1999, 2004 and 2007 (resp. Turner'99, Turner'04 and Andronescu '07) (default is 2004)
-d  <delta_T>       : Temperature variation used for estimating d/dT ln(Z(T))
NOTE: If this parameter is provided, H is computed estimating  <E> = RT² * d/dT ln(Z(T))
-c                 : (Use only in combination of -d) Use the centered version for estimating  <E> = RT² * d/dT ln(Z(T))
-v                 : Output includes method for computing H and the name of the ouput parameters
-z  <energy_is_zero>: [0|1] If value is 1, energies are set to 0, ouput is structural entropy for the uniform case
```

Can anyone please suggest me a way to do it for a multi-fasta? Is there a way to iterate in a shell script?

rna sequence rnaentropy • 333 views
modified 6 months ago by ole.tange3.7k • written 7 months ago by adhirajnath1430
3

If it accepts only single entry fasta sequences, it would be easiest to split the fasta first, then run a bash loop or GNU parallel over all the split files.

Its not clear from that help whether the tool accepts STDIN as input, I suspect not. If it does, you can split the fasta on the fly potentially without having to create intermediate files.

The program is not taking any file as input. I ran a pseudo code and got this result:

```./RNAentropy -s AAGCGCGCGTGACTGCAT
2.562930        0.699157        18      -0.880430
```

Sorry I couldn't follow how to split fasta as you suggested, which would make the input a series of string for each sequence. Sir, can you please provide any link for better understanding the problem. Thank you.

Does your fasta file have multiline sequences?

It is going to be quite complicated to split, parse and extract sequences from your fasta and then run them through the tool all in one go.

I would suggest splitting as I mentioned (please search the forum, there are no end of ways to do this).

You can then loop over all the files something like so:

``````for file in *.fasta; do
./rnaentropy \$(grep -v '>' \$file | tr -d '\n')
done
``````
ADD REPLYlink modified 7 months ago • written 7 months ago by Joe16k
6
6 months ago by
ole.tange3.7k
Denmark
ole.tange3.7k wrote:

With GNU Parallel (Gnu Parallel - Parallelize Serial Command Line Programs Without Changing Them) you would do:

``````cat in.fasta |
parallel --pipe -N1 --recstart '>' --rrs 'read a; echo Doing "\$a"; ./RNAentropy -s \$(tr -d "\n")'
``````

`--recend str` Record end separator for `--pipe`.
`--recstart str` Record start separator for `--pipe`.

ADD COMMENTlink modified 6 months ago by zx87549.2k • written 6 months ago by ole.tange3.7k

I always forget about `recstart`!

2
7 months ago by
JC10k
Mexico
JC10k wrote:

I wouuld use a Perl (or Python script) to parse the multifasta and run your tool, like:

``````#!/usr/bin/perl

use strict;
use warnings;

\$/="\n>"; # read fasta sequences in blocks
while (<>) {
s/>//g;
my (\$id, @seq) = split (/\n/, \$_);
my \$seq = join "", @seq; # rebuild the sequence, one single line
print "\$id\n"; # to indicate which sequence is being analyzed
system("./RNAentropy -s \$seq"); # run the command, add params if needed
}
``````

then you can run as `perl analyzeSeq.pl < multifasta_in > output_file`

Thank you so much, Sir. You are a real life saver.