Question: Count amino acid in protein sequence in R
1
gravatar for boaty
12 weeks ago by
boaty60
boaty60 wrote:

Hello,

Is there any tools to count occurrence of amino acid in protein sequence like this:

string_count("AAA","AA")#count "AA" in sequence "AAA"
[1] 2 #result, so 2 "AA"

I used stringr package but it gives us this:

library(stringr)
str_count("AAA","AA")
[1] 1 #result, only 1 "AA"

But now I want to give a reasonably higher score for longer peptide. So is there any tools for this?

Thanks a lot

R protein count • 323 views
ADD COMMENTlink modified 12 weeks ago by zx87547.3k • written 12 weeks ago by boaty60
2

It is not very clear what is your problem and what you want to achieve.

Is the result of str_count correct to you ? Or do you want str_count("AAA","AA") to result as 2 counts

What kind of score do you want to apply ? Could you share an example ?

ADD REPLYlink written 12 weeks ago by Bastien Hervé4.2k

thanks Bastien, sorry my question was unclear. I want str_count("AAA","AA") to result as 2 counts. for instance, "AAAAAA" will give 5 counts and "AARAAGAAN" gives us 3 counts. This counting strategy will give continuous peptide (like "AAAAAA") more counts.

ADD REPLYlink written 12 weeks ago by boaty60
2

See ATpoint 's answer for the count part. And for the score, if you want to play it dirty you can divide the number of count by the peptide length or create your own score strategy using the start and end position in the result of matchPattern. Like increase the score till df[end] < df[start+1]+1, or something similar

ADD REPLYlink written 12 weeks ago by Bastien Hervé4.2k

this is an advice from a real expert! YES, a scoring strategy, this is what i want to do after all exploratory analysis

ADD REPLYlink written 12 weeks ago by boaty60

Hi boaty,

If an answer was helpful you should upvote it, if the answer resolved your question you should mark it as accepted.

Thanks!

Upvote|Bookmark|Accept

ADD REPLYlink written 12 weeks ago by WouterDeCoster39k

sorry for it. it's done now

ADD REPLYlink written 12 weeks ago by boaty60
4
gravatar for ATpoint
12 weeks ago by
ATpoint17k
Germany
ATpoint17k wrote:

Use the Biostring package from BioC:

matchPattern("AA", AAString("AAA"))

will produce:

Views on a 3-letter AAString subject
subject: AAA
views:
    start end width
[1]     1   2     2 [AA]
[2]     2   3     2 [AA]
ADD COMMENTlink written 12 weeks ago by ATpoint17k

thanks ATpoint, by using your suggestion, I got very good result. Thank you. here's my code for my counting :

peptide_count <- lapply(peptides, function(x) {#peptides, a list of peptides
  peptide=x
  length(unique(c(ranges(matchPattern(peptide, AAString(prot_seq))), #prot_seq : protein sequence to search on
                   ranges(matchPattern(reverse(peptide), AAString(prot_seq))))
            )
    )
})

some additional functions are added to make +/- peptide (like "RY" and "YR") as the same thing. because in this study, reverse peptide is no different to original peptide. In stringr, str_count("RY|YR","RYYRRYRY") can do it. But matchPattern somehow can't, so sorry, I make my code complex .....

ADD REPLYlink written 12 weeks ago by boaty60
Please log in to add an answer.

Help
Access

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.
Powered by Biostar version 2.3.0
Traffic: 885 users visited in the last hour