unique k-mers
1
1
Entering edit mode
17 months ago
Юлия ▴ 10

I have clusters of transposons. I need to find unique k-mers for each cluster. How can I do it?

F.e.:

 >RLX_02.01.01.99_LTR-scaffold_11709_001-v2_chrUn_830094_830205|RLX_02.01.01.99_LTR-scaffold_11709_001-v2_chrUn_830094_830205|3741
TGCAAATGGGGCTAAGAGCCCTGAAGAATAACCAATGGCGTCCATAACCCTGCCCAGGCCAAGCCGGAAGGGTAACCCTGCTAACGACGTCGATCTCAAAACCTGCTTAAAC


I take the length of k-mers equal to 23.

k-mers • 939 views
0
Entering edit mode

What did you try so far?

1
Entering edit mode
17 months ago
bas1993 ▴ 60

You could write a script in python to loop though the sequence per kmer size and compare them to a list where you put all unique kmers you found.

sequence="TGCAAATGGGGCTAAGAGCCCTGAAGAATAACCAATGGCGTCCATAACCCTGCCCAGGCCAAGCCGGAAGGGTAACCCTGCTAACGACGTCGATCTCAAAACCTGCTTAAAC"
k=0
unique=[]
genome_length=len(sequence) - 23
while k < genome_length:
kmer = sequence[k:k+23]
if kmer in unique:
k+=1
else:
unique.append(kmer)
k+=1
print(unique)

0
Entering edit mode

If I am not mistaken, your code are trying just to find a list of a unique k-mers. But we need to find unique k-mers for each clusters. By unique k-mers I mean such k-mers which occur in each (or almost each) sequence of a given cluster but is absent (or almost absent) in sequences of another cluster.

0
Entering edit mode

Then you could rewrite it so that the detected kmers are stored in a dictionary as keys and each time they occur that the value increases. Then you could look at each kmer that occured once and compare them between your clusters.

Maybe there is a tool for this, but I'm not aware of any.

0
Entering edit mode

You'd just need the Counter module for this.

0
Entering edit mode

when I tried this: k=0 unique=[] genome_length=len(sequence) - 23 while k < genome_length: kmer = sequence[k:k+23] if kmer in unique: k+=1 else: unique.append(kmer) k+=1 print(unique)

I got the list of kmers. Then when I applied this: def get_unique(in_list):

unq_list = []

# Итерация по списку

for x in in_list:

  # если значения x нету в unq_list то добавляем
if x not in unq_list:
unq_list.append(x)


# вывод списка

for x in unq_list: print(x)

my_list = unique print("Уникальным значениями в списке {0} являются".format(my_list)) get_unique(my_list)

I just got the similar list : Уникальным значениями в списке ['TAGCAACCCTAGCCTCCGGCTAA', 'AGCAACCCTAGCCTCCGGCTAAG', 'GCAACCCTAGCCTCCGGCTAAGC', 'CAACCCTAGCCTCCGGCTAAGCT', 'AACCCTAGCCTCCGGCTAAGCTT', 'ACCCTAGCCTCCGGCTAAGCTTC', 'CCCTAGCCTCCGGCTAAGCTTCC', 'CCTAGCCTCCGGCTAAGCTTCCT', 'CTAGCCTCCGGCTAAGCTTCCTC', 'TAGCCTCCGGCTAAGCTTCCTCC', 'AGCCTCCGGCTAAGCTTCCTCCT', 'GCCTCCGGCTAAGCTTCCTCCTC', 'CCTCCGGCTAAGCTTCCTCCTCG', 'CTCCGGCTAAGCTTCCTCCTCGG', 'TCCGGCTAAGCTTCCTCCTCGGC', 'CCGGCTAAGCTTCCTCCTCGGCG', 'CGGCTAAGCTTCCTCCTCGGCGT', 'GGCTAAGCTTCCTCCTCGGCGTG', 'GCTAAGCTTCCTCCTCGGCGTGT', 'CTAAGCTTCCTCCTCGGCGTGTC', 'TAAGCTTCCTCCTCGGCGTGTCT', 'AAGCTTCCTCCTCGGCGTGTCTA', 'AGCTTCCTCCTCGGCGTGTCTAA', 'GCTTCCTCCTCGGCGTGTCTAAA', 'CTTCCTCCTCGGCGTGTCTAAAC', 'TTCCTCCTCGGCGTGTCTAAACC', 'TCCTCCTCGGCGTGTCTAAACCC', 'CCTCCTCGGCGTGTCTAAACCCT', 'CTCCTCGGCGTGTCTAAACCCTA', 'TCCTCGGCGTGTCTAAACCCTAG', 'CCTCGGCGTGTCTAAACCCTAGA', 'CTCGGCGTGTCTAAACCCTAGAT', 'TCGGCGTGTCTAAACCCTAGATC', 'CGGCGTGTCTAAACCCTAGATCG', 'GGCGTGTCTAAACCCTAGATCGT', 'GCGTGTCTAAACCCTAGATCGTC', 'CGTGTCTAAACCCTAGATCGTCG', 'GTGTCTAAACCCTAGATCGTCGA', 'TGTCTAAACCCTAGATCGTCGAG', 'GTCTAAACCCTAGATCGTCGAGG', 'TCTAAACCCTAGATCGTCGAGGA', 'CTAAACCCTAGATCGTCGAGGAA', 'TAAACCCTAGATCGTCGAGGAAC', 'AAACCCTAGATCGTCGAGGAACT', 'AACCCTAGATCGTCGAGGAACTC', 'ACCCTAGATCGTCGAGGAACTCT', 'CCCTAGATCGTCGAGGAACTCTC', 'CCTAGATCGTCGAGGAACTCTCT', 'CTAGATCGTCGAGGAACTCTCTC', .....