Question

Can we use JELLYFISH for amino acid seqeunces?

1

Entering edit mode

6.2 years ago

marksingh1982 ▴ 10

I have a very large file of 1000's of amino acid sequences. I would like to identify ALL overlapping k-mers.

I was hoping to use JELLYFISH, but it only seems to work with nucleic acid sequences. Is there any way to use the tool to read and parse the fasta file of amino acids?

jellyfish genome sequencing • 1.8k views

ADD COMMENT • link updated 5.7 years ago by Biostar 20 • written 6.2 years ago by marksingh1982 ▴ 10

0

Entering edit mode

This is a question; you're not posting about a tool. I've made the required change now, please be more careful in the future.

ADD REPLY • link 6.2 years ago by Ram 43k

score 3 · Answer 1 · 2018-02-13

3

Entering edit mode

6.2 years ago

Sej Modha 5.3k

Jellyfish cannot deal with the protein sequences. You can use skbio iter_kmers() for this. BioPython solution:

from Bio import SeqIO
from skbio import Sequence

myfile=SeqIO.parse('test.fa','fasta')
for record in myfile:
    sequence=Sequence(str(record.seq))
    for kmer in sequence.iter_kmers(4, overlap=True):
        print(str(kmer))