Question: Generate kmer profiles from a bunch of peptide sequences
1
gravatar for Owen S.
5.4 years ago by
Owen S.350
Oakland CA
Owen S.350 wrote:

Can anyone recommend a software solution to do this:

  • Input: about 100,000 short peptide sequences -- unaligned -- of varying lengths, but mostly under 20 residues.
  • Output:  amino-acid profiles (e.g. sequence logo map) describing similar over-represented kmers (say, 3-or 4- or 5-mers).

I can think of ways to tackle this myself*, but why re-invent the wheel?  Hoping that my question and any discussion that follows may also help others.

 

Thanks!

* PS. My approach would be something like this:

  1. count all unique kmers
  2. calculate pairwise distances
  3. select clusters (clades) of similar kmers
  4. use these kmers (and their counts) to build sequence logo maps

 

epitope hmm sequence • 1.8k views
ADD COMMENTlink modified 5.4 years ago by Sean Davis25k • written 5.4 years ago by Owen S.350
0
gravatar for Sean Davis
5.4 years ago by
Sean Davis25k
National Institutes of Health, Bethesda, MD
Sean Davis25k wrote:

The Biostrings Bioconductor package has fast kmer counting (oligonucleotideFrequency) functionality.  You can then take your results and do all kinds of stats, clustering, and visualization.  

ADD COMMENTlink modified 5.4 years ago • written 5.4 years ago by Sean Davis25k

Thanks, but my question relates to peptide, not nucleotide, sequences.  (The Biostrings function you suggested only works with nucleotide seqs.) 

ADD REPLYlink written 5.4 years ago by Owen S.350
Please log in to add an answer.

Help
Access

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.
Powered by Biostar version 2.3.0
Traffic: 1230 users visited in the last hour