Question: Count amino acid in protein sequence in R
1
gravatar for boaty
15 months ago by
boaty110
boaty110 wrote:

Hello,

Is there any tools to count occurrence of amino acid in protein sequence like this:

string_count("AAA","AA")#count "AA" in sequence "AAA"
[1] 2 #result, so 2 "AA"

I used stringr package but it gives us this:

library(stringr)
str_count("AAA","AA")
[1] 1 #result, only 1 "AA"

But now I want to give a reasonably higher score for longer peptide. So is there any tools for this?

Thanks a lot

R protein count • 949 views
ADD COMMENTlink modified 15 months ago by zx87549.2k • written 15 months ago by boaty110
2

It is not very clear what is your problem and what you want to achieve.

Is the result of str_count correct to you ? Or do you want str_count("AAA","AA") to result as 2 counts

What kind of score do you want to apply ? Could you share an example ?

ADD REPLYlink written 15 months ago by Bastien Hervé4.5k

thanks Bastien, sorry my question was unclear. I want str_count("AAA","AA") to result as 2 counts. for instance, "AAAAAA" will give 5 counts and "AARAAGAAN" gives us 3 counts. This counting strategy will give continuous peptide (like "AAAAAA") more counts.

ADD REPLYlink written 15 months ago by boaty110
2

See ATpoint 's answer for the count part. And for the score, if you want to play it dirty you can divide the number of count by the peptide length or create your own score strategy using the start and end position in the result of matchPattern. Like increase the score till df[end] < df[start+1]+1, or something similar

ADD REPLYlink written 15 months ago by Bastien Hervé4.5k

this is an advice from a real expert! YES, a scoring strategy, this is what i want to do after all exploratory analysis

ADD REPLYlink written 15 months ago by boaty110

Hi boaty,

If an answer was helpful you should upvote it, if the answer resolved your question you should mark it as accepted.

Thanks!

Upvote|Bookmark|Accept

ADD REPLYlink written 15 months ago by WouterDeCoster43k

sorry for it. it's done now

ADD REPLYlink written 15 months ago by boaty110
4
gravatar for ATpoint
15 months ago by
ATpoint35k
Germany
ATpoint35k wrote:

Use the Biostring package from BioC:

matchPattern("AA", AAString("AAA"))

will produce:

Views on a 3-letter AAString subject
subject: AAA
views:
    start end width
[1]     1   2     2 [AA]
[2]     2   3     2 [AA]
ADD COMMENTlink written 15 months ago by ATpoint35k

thanks ATpoint, by using your suggestion, I got very good result. Thank you. here's my code for my counting :

peptide_count <- lapply(peptides, function(x) {#peptides, a list of peptides
  peptide=x
  length(unique(c(ranges(matchPattern(peptide, AAString(prot_seq))), #prot_seq : protein sequence to search on
                   ranges(matchPattern(reverse(peptide), AAString(prot_seq))))
            )
    )
})

some additional functions are added to make +/- peptide (like "RY" and "YR") as the same thing. because in this study, reverse peptide is no different to original peptide. In stringr, str_count("RY|YR","RYYRRYRY") can do it. But matchPattern somehow can't, so sorry, I make my code complex .....

ADD REPLYlink written 15 months ago by boaty110
Please log in to add an answer.

Help
Access

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.
Powered by Biostar version 2.3.0
Traffic: 1172 users visited in the last hour