Question

Generate kmer profiles from a bunch of peptide sequences

1

Entering edit mode

9.8 years ago

Owen S. ▴ 370

Can anyone recommend a software solution to do this:

Input: about 100,000 short peptide sequences -- unaligned -- of varying lengths, but mostly under 20 residues.
Output: amino-acid profiles (e.g. sequence logo map) describing similar over-represented kmers (say, 3-or 4- or 5-mers).

I can think of ways to tackle this myself*, but why re-invent the wheel? Hoping that my question and any discussion that follows may also help others.

Thanks!

PS. My approach would be something like this:

count all unique kmers
calculate pairwise distances
select clusters (clades) of similar kmers
use these kmers (and their counts) to build sequence logo maps

sequence hmm epitope • 2.9k views

ADD COMMENT • link updated 2.3 years ago by Ram 43k • written 9.8 years ago by Owen S. ▴ 370

Ram · Answer 1 · 2014-07-24

0

Entering edit mode

9.8 years ago

Sean Davis 26k

The Biostrings Bioconductor package has fast kmer counting (oligonucleotideFrequency) functionality. You can then take your results and do all kinds of stats, clustering, and visualization.

ADD COMMENT • link updated 2.4 years ago by Ram 43k • written 9.8 years ago by Sean Davis 26k

0

Entering edit mode

Thanks, but my question relates to peptide, not nucleotide, sequences. (The Biostrings function you suggested only works with nucleotide seqs.)

ADD REPLY • link updated 2.4 years ago by Ram 43k • written 9.8 years ago by Owen S. ▴ 370