I have a DNA sequence (of a chromosome) where each nucleotide has a score. For example (na indicates missing scores):
T C G A T T A G G A T C 0 1 10 20 5 na na 0 5 2 0 0
My goal is to select all subsequences with high energy scores. No any other a priori information is known except that subsequences are not expected to be long, approximately 5-15 nucleotides. Could someone suggest some ideas how to approach this?
Edit 1: The presented sequence of 12 positions is a snippet of a much longer nucleotide sequence, ~230K positions.
Edit 2: Answer found in "A linear time algorithm for finding all maximal scoring subsequences." Ruzzo-Tompa algorithm.