Entering edit mode
6.2 years ago
david.bersten
•
0
Hi I have generated experimental data where the output is Kmer DNA sequence string of 12 and associated 'affinity' and tag count. Most Kmers DNA strings are not represented in the Affinity lists. Anyone have any suggestions of methods to model the affinities of missing Kmers?
example data
Kmer ObservedCount Probability ExpectedCount Affinity SE AGTGTAACGTGTC NA NA NA NA NA NA CGTGTAACGTGTC 130 3.6320387884448403e-6 991.7716666891613 0.032857923466624035 0.0040755238163352366 GGTGTAACGTGTC NA NA NA NA NA NA
Why not programmatically generate every possible kmer of size 12 and test each? I have JAVA code that does this for any size. Granted, that would be a huge number of kmers.