Question: Can we use JELLYFISH for amino acid seqeunces?
2.1 years ago
marksingh19820 wrote:

I have a very large file of 1000's of amino acid sequences. I would like to identify ALL overlapping k-mers.

I was hoping to use JELLYFISH, but it only seems to work with nucleic acid sequences. Is there any way to use the tool to read and parse the fasta file of amino acids?

2.1 years ago
Sej Modha
Glasgow, UK
Sej Modha wrote:

Jellyfish cannot deal with the protein sequences. You can use skbio iter_kmers() for this. BioPython solution:

from Bio import SeqIO
from skbio import Sequence

for record in myfile:
    for kmer in sequence.iter_kmers(4, overlap=True):
Thank you for your quick reply!

