Entering edit mode
8.0 years ago
blasco
•
0
I would like to find short stretches of sequence (i.e. 18-20 nt) present in two fasta files. The idea is to identify those matches in sequences of otherwise distant organisms, or distant metagenomes. I have seen programs that look for similar reads, but those would not identify short stretches within the reads. Is there any program that can do that?.
Sounds like you are looking to identify prevalent k-mers (18-20) in your sequences. kmercountexact.sh from BBMap may be worth looking at.