Question: Assessing diversity of random oligonucleotides
1
gravatar for dariober
4.1 years ago by
dariober10k
WCIP | Glasgow | UK
dariober10k wrote:

Hi All- I'm looking for some algorithm or statistics to estimate the randomness of a pool of oligonucleotides.

We have synthesized adapters containing a stretch of 15 N (i.e. random A,C,T, or G). This 15N will be part of the sequenced reads (Illumina sequencing, so in the order of millions). Ideally, each nucleotide will have the same chance of being present at any position in the 15N, regardless of the nucleotide (A,C,T, or G) or the position in the string (1 to 15). In practice, some biases are inevitable and some nucleotides are preferentially incorporated etc...

So, is there any simple way of summarizing the randomness of this poll of oligonucletides? I think some ideas are here (Estimating the entropy of DNA sequences) and in sequence logo creation. Any suggestions?

Dario

ADD COMMENTlink modified 4.0 years ago by Vincent Laufer1.1k • written 4.1 years ago by dariober10k

How did you pick these sequences? 

ADD REPLYlink written 4.0 years ago by Biomonika (Noolean)3.1k

Hi- The 15N are random sequences (or supposed to be).

ADD REPLYlink written 4.0 years ago by dariober10k
1
gravatar for Vincent Laufer
4.0 years ago by
Vincent Laufer1.1k
United States
Vincent Laufer1.1k wrote:

Hi Dario,

There is a substantial amount of scholarship available on this already, as you seem to have noticed. A google search for "calculating information entropy of DNA sequences" seems to return an abundance of papers, some of which seem to provide answers. 

Of these, the most helpful source I located was: Shannon entropy of a DNA motif?  Check the first answer, the links provided in it, and see if they help. If not, let me know and I'll keep looking. 

Lastly, I can tell you that much DNA sequence is highly non-random... so depending on where you are looking there might be a strong expectation that there is more or less entropy (e.g. exons tend to have higher IE than introns http://bioinformatics.oxfordjournals.org/content/early/2011/02/10/bioinformatics.btr077.full.pdf) already. Anyway that's an aside but it just makes me curious as to where and why you are looking.

hope it helps.

ADD COMMENTlink modified 4.0 years ago • written 4.0 years ago by Vincent Laufer1.1k

Thanks, yes the Shannon entropy seems to be suitable. About where and why, I'm looking at supposedly random sequences synthesized by a company, they are not coming from genomic regions.

ADD REPLYlink written 4.0 years ago by dariober10k
1

Ah I see. Is it accurate to say that you are determining whether or not something that was advertised as random actually is random?

ADD REPLYlink written 4.0 years ago by Vincent Laufer1.1k
1

Yes, more precisely I'd like to have a measure of how random the oligo mix is. As far as I know companies put equal molar amounts of the four nucleotides when asked for "N" in the oligo. But since the four nucleotides have different probabilities of being incorporated plus various additional biases, the question is not so much whether the mix is random, rather how badly it deviates from expected random.

ADD REPLYlink modified 4.0 years ago • written 4.0 years ago by dariober10k

Interesting. I vaguely remember reading that in the Sanger paper a really long time ago. I think you could probably quickly ballpark this without using an algorithm from the information entropy lit, but since there is a packaged solution for everything these days, I'd just go ahead and do it. Was the link provided helpful enough or are you probably going to keep looking?

ADD REPLYlink written 4.0 years ago by Vincent Laufer1.1k
Please log in to add an answer.

Help
Access

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.
Powered by Biostar version 2.3.0
Traffic: 1719 users visited in the last hour