How to calculate the entropy of a DNA sequence - and/or it's complexity
1
1
Entering edit mode
4.2 years ago

Hi,

I am trying to propose a way to show that a given sequence, given to another one is less/more complex. I would like to explain by this that some mapping issues can come from there. I'd like to show it with a 4 letters genome, and a 3 letters genome (bisulfite converted).

I heard that shannon's entropy can help me in that, but I am actually not very sure. 1) it seems it works fine to find motifs, to find what's possibly common when comparing sequences (https://bioinformatics.stackexchange.com/questions/9091/why-do-ten-rows-figure-1-correspond-to-2-bits-figure-2-in-a-sequence-logo/9094#9094) and I think I quite understand how it is calculated.

2) I have found some formulas and calculator to calculate a general entropy ( http://www.shannonentropy.netmark.pl/ ) that is interesting and it may help me.

3) I was, however, thinking that maybe I could calculate an entropy factor for a given sequence regarding repeated motifs it may have ( like in this paper http://citeseerx.ist.psu.edu/viewdoc/download?doi=10.1.1.139.1231&rep=rep1&type=pdf )

4) finally, but I think I can't find it back, I would have been in search for a 'by position' entropy, that would show a decrease of complexity in some parts of my sequences. It seems the package HDMD can help me, but again, I need to "compare" different sequences to have en entropy score.

Alternatively, if it's a bad way to assess complexity of sequencing (related to mapping), would you recommend something else?

Best,

WGBS DNA R genome • 4.6k views
ADD COMMENT
1
Entering edit mode
4.2 years ago
onestop_data ▴ 330

Great question. Shannon Entropy should do the job to compute complexity for a DNA string. Here I share a script to do it.. I hope it helps.

ADD COMMENT

Login before adding your answer.

Traffic: 2078 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6