I am handling sequences in FASTA format. I used bioperl to parse the sequences. Now, I have to select any two sequences randomly at a time and use it to find the unknown motif. My initial aim is to choose any sequences randomly. I read that function rand() is used to do randomization. But my concern is how can I do it using function rand()? As rand() uses array for randomization and I cannot put the sequences in an array as I think it will not be efficient. Also, does rand() chooses the sequences only once or it may choose the sequences more than once randomly? This code is suppose to run on any number of sequences with a fasta file as input.
My aim is to follow the greedy algorithm approach from the book An Introduction to Bioinformatics Algorithm by Jones and Pevzner. Here they have used only first two sequences to find (s1,s2) and then rest of the sequences compared with these two vectors to find the score. In my homework problem I have to choose the pair of sequences randomly. The code is as follows:
GREEDYMOTIFSEARCH(DNA, t, n, l) 1 bestMotif ← (1, 1, . . . , 1) 2 s ← (1, 1, . . . , 1) 3 for s1 ← 1 to n − l + 1 4 for s2 ← 1 to n − l + 1 5 if Score(s, 2,DNA) > Score(bestMotif , 2,DNA) 6 BestMotif1 ← s1 7 BestMotif2 ← s2 8 s1 ← BestMotif1 9 s2 ← BestMotif2 10 for i ← 3 to t 11 for si ← 1 to n − l + 1 12 if Score(s, i,DNA) > Score(bestMotif, i,DNA) 13 bestMotifi ← si 14 si ← bestMotifi 15 return bestMotif
where l is the length of the motif I have to find(this is taken from user), n is the number of nucleotide in a single sequence and t are the number of sequences. Thank you for looking at this question. Any help is greatly appreciated. Let me know if my question is not put in a clear way and I will try to rewrite it in other words.