Hi All- I'm looking for some algorithm or statistics to estimate the randomness of a pool of oligonucleotides.
We have synthesized adapters containing a stretch of 15 N (i.e. random A,C,T, or G). This 15N will be part of the sequenced reads (Illumina sequencing, so in the order of millions). Ideally, each nucleotide will have the same chance of being present at any position in the 15N, regardless of the nucleotide (A,C,T, or G) or the position in the string (1 to 15). In practice, some biases are inevitable and some nucleotides are preferentially incorporated etc...
So, is there any simple way of summarizing the randomness of this poll of oligonucletides? I think some ideas are here (Estimating the entropy of DNA sequences) and in sequence logo creation. Any suggestions?