Hi all,

I'm looking to determinate shannon entropy score for a short sequence corresponding for an hyper-variable region, the idea is to compare this region for different samples. Any experience with that?

shannon entropy score

3

Entering edit mode

5.8 years ago

curiousbiologist
▴
40

Hi all,

I'm looking to determinate shannon entropy score for a short sequence corresponding for an hyper-variable region, the idea is to compare this region for different samples. Any experience with that?

2

Entering edit mode

5.8 years ago

Joseph Hughes
★
3.0k

There is an R package called entropy.

2

Entering edit mode

4.6 years ago

sacha
★
2.4k

'seqtk comp' command return #A,#C,#G,#T composition.

With the following fasta file :

```
>seq1
AAAA
>seq2
ATCGACTTTTTTGTAGTACGTA
```

You can run this oneliner to get Shannon entropy score for each sequence in your fasta.

```
seqtk comp test.fa|awk '{for(i=3;i<=6;i++){if($i){H+=$i/$2*log($i/$2)/log(2)}}print $1,-H}'
```

which return :

```
seq1 0
seq2 1.84199
```

1

Entering edit mode

5.8 years ago

Gabriel R.
★
2.8k

Here is a C++ implementation:

It hasn't been used/tested extensively but feel free to use the code.

1

Entering edit mode

5.8 years ago

Brian Bushnell
18k

BBDuk calculates Shannon entropy, and can pass or fail sequences based on the score. For example:

```
bbduk.sh in=sequences.fa out=pass.fa outm=fail.fa entropy=0.9 entropywindow=50 entropyk=5
```

The code is in BBDukF.java in the function averageEntropy().

1

Entering edit mode

4.6 years ago

vmicrobio
▴
290

Give a try to biojava:

```
import java.util.*;
import org.biojava.bio.dist.*;
import org.biojava.bio.seq.*;
import org.biojava.bio.symbol.*;
public class Entropy {
public static void main(String[] args) {
Distribution dist = null;
try {
//create a biased distribution
dist =
DistributionFactory.DEFAULT.createDistribution(DNATools.getDNA());
//set the weight of a to 0.97
dist.setWeight(DNATools.a(), 0.97);
//set the others to 0.01
dist.setWeight(DNATools.c(), 0.01);
dist.setWeight(DNATools.g(), 0.01);
dist.setWeight(DNATools.t(), 0.01);
}
catch (Exception ex) {
ex.printStackTrace();
System.exit(-1);
}
//calculate the information content
double info = DistributionTools.bitsOfInformation(dist);
System.out.println("information = "+info+" bits");
System.out.print("\n");
//calculate the Entropy (using the conventional log base of 2)
HashMap entropy = DistributionTools.shannonEntropy(dist, 2.0);
//print the Entropy of each residue
System.out.println("Symbol\tEntropy");
for (Iterator i = entropy.keySet().iterator(); i.hasNext(); ) {
Symbol sym = (Symbol)i.next();
System.out.println(sym.getName()+ "\t" +entropy.get(sym));
}
}
}
```

0

Entering edit mode

5.8 years ago

curiousbiologist
▴
40

If found this on the net. Next step would be to implement it for a NGS use

http://code.activestate.com/recipes/577476-shannon-entropy-calculation/

```
# Shannon Entropy of a string
# = minimum average number of bits per symbol
# required for encoding the string
#
# So the theoretical limit for data compression:
# Shannon Entropy of the string * string length
# FB - 201011291
import math
from sets import Set
st = 'acgtaggatcccctat' # input string
# st = '00010101011110' # Shannon entropy for 'aabcddddefffg' would be 1 bit/symbol
print 'Input string:'
print st
print
stList = list(st)
alphabet = list(Set(stList)) # list of symbols in the string
print 'Alphabet of symbols in the string:'
print alphabet
print
# calculate the frequency of each symbol in the string
freqList = []
for symbol in alphabet:
ctr = 0
for sym in stList:
if sym == symbol:
ctr += 1
freqList.append(float(ctr) / len(stList))
print 'Frequencies of alphabet symbols:'
print freqList
print
# Shannon entropy
ent = 0.0
for freq in freqList:
ent = ent + freq * math.log(freq, 2)
ent = -ent
print 'Shannon entropy:'
print ent
print 'Minimum number of bits required to encode each symbol:'
print int(math.ceil(ent))
```

0

Entering edit mode

5.8 years ago

ahmedakhokhar
▴
150

Please see the publication http://bioinformatics.oxfordjournals.org/content/23/15/1875.full.pdf

Similar Posts

Loading Similar Posts

Traffic: 1916 users visited in the last hour

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Entropy Of Dna Sequences

Calculating Shannon Entropy for DNA sequence?: http://math.stackexchange.com/questions/1405130/calculating-shannon-entropy-for-dna-sequence