I've a little question about consensus sequences. So I've a bunch of small overlapping sequences from different groups and I want to regroup sequences from the same group together and compute the consensus sequences. The thing is that I don't know from which group sequences are coming
A little example is better to understand (here there are thre groups - line 1-4 : g1 ; line 5-8 : g2 ; line 9-12 : g3):
AAATTTGGGCCC AAATTTGGG AAATTTG TTTGGGCCCAAA ATGCATGCAT ATGCATGC GCATGCATGC TGCATGCAT ACGTACGTACGT ACGTACGTA GTACGTACGTAC
And the expected output would be :
Group1 : AAATTTGGGCCCAAA Group2 : ATGCATGCATGC Group3 : ACGTACGTACGTAC
The problem is to cluster the sequences together to form the groups. After the consenus sequence is pretty simple to do.
Anyone has an idea ?