Question: Calculate the coverage of a protein having a list of its peptides
0
gravatar for arronar
20 months ago by
arronar170
Austria
arronar170 wrote:

Hello out there.

I was wondering if there is a simple way using R to calculate the coverage of a protein when you have a list of peptides from it and its initial sequence.

For example let's say that we have this protein sequence taken from uniprot:

MAFSAEDVLKEYDRRRRMEALLLSLYYPNDRKLLDYKEWSPPRVQVECPKAPVEWNNPPS
EKGLIVGHFSGIKYKGEKAQASEVDVNKMCCWVSKFKDAMRRYQGIQTCKIPGKVLSDLD
AKIKAYNLTVEGVEGFVRYSRVTKQHVAAFLKELRHSKQYENVNLIHYILTDKRVDIQHL
EKDLVKDFKALVESAHRMRQGHMINVKYILYQLLKKHGHGPDGPDILTVKTGSKGVLYDD
SFRKIYTDLGWKFTPL

and we have a list of some of its peptides that may or may not overlap one an other.

pepts = c("DRRRRMEALLLSLY", "YPNDRKLL", "DYKEWSPPRVQVECPKAPVEWNNPPS
    EKGLIVGHFSGIKYKGEKAQA", "SEVDVNK", "MCCWVSKFKDAMRRYQGIQ", "TCKIPGK", "VLSDLD
    AKIKAYNLTVEGVEGFVRYSRVTK", "DRRRRMEALLLSLYYPNDRKLL" , "SEVDVNKMCCWVSKFK")

Can we somehow to calculate the coverage ?

Thank you.

uniprot coverage protein R • 1.5k views
ADD COMMENTlink modified 20 months ago by Jean-Karim Heriche18k • written 20 months ago by arronar170

While this is not a R solution, have you thought of doing multiple-sequence alignment?

ADD REPLYlink modified 20 months ago • written 20 months ago by genomax67k

I tried clustal omega but I don't know how to get its results inside R and also it doesn't seem to return a percentage of coverage.

ADD REPLYlink written 20 months ago by arronar170

Not my field of work, however I found 2 solutions looking in google. Not tested my end. Try and see if it fits yours.

For MS data : isobar R package does the work, check the pdf

I also found this tool Protein Coverage Summarizer but it's not an R package

ADD REPLYlink modified 20 months ago • written 20 months ago by ivivek_ngs4.8k

Thank you but none of them seem to can help me.

ADD REPLYlink written 20 months ago by arronar170
3
gravatar for Jean-Karim Heriche
20 months ago by
EMBL Heidelberg, Germany
Jean-Karim Heriche18k wrote:

Just use regular expressions to match the peptides to the protein sequence and record an X at each matched position. When all of the peptides have been processed, count the Xs.

ADD COMMENTlink written 20 months ago by Jean-Karim Heriche18k

Just what I would do. I don't think, there is simpler solution.

ADD REPLYlink written 20 months ago by aquaq10

I guess that I have to count both the starting and ending position of each match and then sum them up because some of them may be overlap each other.

ADD REPLYlink written 20 months ago by arronar170
2

No need to sum anything. Here is a perl way of doing it:

my $cover_seq = $protein_seq; # copy in which we're going to replace matches by X
foreach my $peptide_seq(@peptides) {
    if ($protein_seq=~/$peptide_seq/) { # peptide matches the protein
        my $start = $-[0]; # start position of match
        my $end = $+[0]; # end position of match
        my $len = $end - $start; # length of the match
        # Replace peptide by Xs in protein sequence
        substr($cover_seq, $start, $len) = 'X' x $len;
   }
}
# Count number of Xs to get coverage
my $coverage = ($cover_seq=~tr/X//)/length($cover_seq) * 100;
ADD REPLYlink modified 20 months ago • written 20 months ago by Jean-Karim Heriche18k

Oh I see. Thank you.

ADD REPLYlink written 20 months ago by arronar170
Please log in to add an answer.

Help
Access

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.
Powered by Biostar version 2.3.0
Traffic: 1998 users visited in the last hour