Count amino acid in protein sequence in R
1
1
Entering edit mode
5.1 years ago
boaty ▴ 220

Hello,

Is there any tools to count occurrence of amino acid in protein sequence like this:

string_count("AAA","AA")#count "AA" in sequence "AAA"
[1] 2 #result, so 2 "AA"

I used stringr package but it gives us this:

library(stringr)
str_count("AAA","AA")
[1] 1 #result, only 1 "AA"

But now I want to give a reasonably higher score for longer peptide. So is there any tools for this?

Thanks a lot

protein count R • 3.7k views
ADD COMMENT
2
Entering edit mode

It is not very clear what is your problem and what you want to achieve.

Is the result of str_count correct to you ? Or do you want str_count("AAA","AA") to result as 2 counts

What kind of score do you want to apply ? Could you share an example ?

ADD REPLY
0
Entering edit mode

thanks Bastien, sorry my question was unclear. I want str_count("AAA","AA") to result as 2 counts. for instance, "AAAAAA" will give 5 counts and "AARAAGAAN" gives us 3 counts. This counting strategy will give continuous peptide (like "AAAAAA") more counts.

ADD REPLY
2
Entering edit mode

See ATpoint 's answer for the count part. And for the score, if you want to play it dirty you can divide the number of count by the peptide length or create your own score strategy using the start and end position in the result of matchPattern. Like increase the score till df[end] < df[start+1]+1, or something similar

ADD REPLY
0
Entering edit mode

this is an advice from a real expert! YES, a scoring strategy, this is what i want to do after all exploratory analysis

ADD REPLY
0
Entering edit mode

Hi boaty,

If an answer was helpful you should upvote it, if the answer resolved your question you should mark it as accepted.

Thanks!

Upvote|Bookmark|Accept

ADD REPLY
0
Entering edit mode

sorry for it. it's done now

ADD REPLY
4
Entering edit mode
5.1 years ago
ATpoint 81k

Use the Biostring package from BioC:

matchPattern("AA", AAString("AAA"))

will produce:

Views on a 3-letter AAString subject
subject: AAA
views:
    start end width
[1]     1   2     2 [AA]
[2]     2   3     2 [AA]
ADD COMMENT
0
Entering edit mode

thanks ATpoint, by using your suggestion, I got very good result. Thank you. here's my code for my counting :

peptide_count <- lapply(peptides, function(x) {#peptides, a list of peptides
  peptide=x
  length(unique(c(ranges(matchPattern(peptide, AAString(prot_seq))), #prot_seq : protein sequence to search on
                   ranges(matchPattern(reverse(peptide), AAString(prot_seq))))
            )
    )
})

some additional functions are added to make +/- peptide (like "RY" and "YR") as the same thing. because in this study, reverse peptide is no different to original peptide. In stringr, str_count("RY|YR","RYYRRYRY") can do it. But matchPattern somehow can't, so sorry, I make my code complex .....

ADD REPLY

Login before adding your answer.

Traffic: 1919 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6