Question

Calculate the coverage of a protein having a list of its peptides

0

Entering edit mode

6.7 years ago

arronar ▴ 280

Hello out there.

I was wondering if there is a simple way using R to calculate the coverage of a protein when you have a list of peptides from it and its initial sequence.

For example let's say that we have this protein sequence taken from uniprot:

MAFSAEDVLKEYDRRRRMEALLLSLYYPNDRKLLDYKEWSPPRVQVECPKAPVEWNNPPS
EKGLIVGHFSGIKYKGEKAQASEVDVNKMCCWVSKFKDAMRRYQGIQTCKIPGKVLSDLD
AKIKAYNLTVEGVEGFVRYSRVTKQHVAAFLKELRHSKQYENVNLIHYILTDKRVDIQHL
EKDLVKDFKALVESAHRMRQGHMINVKYILYQLLKKHGHGPDGPDILTVKTGSKGVLYDD
SFRKIYTDLGWKFTPL

and we have a list of some of its peptides that may or may not overlap one an other.

pepts = c("DRRRRMEALLLSLY", "YPNDRKLL", "DYKEWSPPRVQVECPKAPVEWNNPPS
    EKGLIVGHFSGIKYKGEKAQA", "SEVDVNK", "MCCWVSKFKDAMRRYQGIQ", "TCKIPGK", "VLSDLD
    AKIKAYNLTVEGVEGFVRYSRVTK", "DRRRRMEALLLSLYYPNDRKLL" , "SEVDVNKMCCWVSKFK")

Can we somehow to calculate the coverage ?

Thank you.

R protein coverage uniprot • 4.4k views

ADD COMMENT • link updated 6.7 years ago by Jean-Karim Heriche 27k • written 6.7 years ago by arronar ▴ 280

0

Entering edit mode

While this is not a R solution, have you thought of doing multiple-sequence alignment?

ADD REPLY • link 6.7 years ago by GenoMax 142k

0

Entering edit mode

I tried clustal omega but I don't know how to get its results inside R and also it doesn't seem to return a percentage of coverage.

ADD REPLY • link 6.7 years ago by arronar ▴ 280

0

Entering edit mode

Not my field of work, however I found 2 solutions looking in google. Not tested my end. Try and see if it fits yours.

For MS data : isobar R package does the work, check the pdf

I also found this tool Protein Coverage Summarizer but it's not an R package

ADD REPLY • link 6.7 years ago by ivivek_ngs ★ 5.2k

0

Entering edit mode

Thank you but none of them seem to can help me.

ADD REPLY • link 6.7 years ago by arronar ▴ 280

score 3 · Accepted Answer · 2017-09-06

3

Entering edit mode

6.7 years ago

Jean-Karim Heriche 27k

Just use regular expressions to match the peptides to the protein sequence and record an X at each matched position. When all of the peptides have been processed, count the Xs.

ADD COMMENT • link 6.7 years ago by Jean-Karim Heriche 27k

0

Entering edit mode

Just what I would do. I don't think, there is simpler solution.

ADD REPLY • link 6.7 years ago by aquaq ▴ 40

0

Entering edit mode

I guess that I have to count both the starting and ending position of each match and then sum them up because some of them may be overlap each other.

ADD REPLY • link 6.7 years ago by arronar ▴ 280

2

Entering edit mode

No need to sum anything. Here is a perl way of doing it:

my $cover_seq = $protein_seq; # copy in which we're going to replace matches by X
foreach my $peptide_seq(@peptides) {
    if ($protein_seq=~/$peptide_seq/) { # peptide matches the protein
        my $start = $-[0]; # start position of match
        my $end = $+[0]; # end position of match
        my $len = $end - $start; # length of the match
        # Replace peptide by Xs in protein sequence
        substr($cover_seq, $start, $len) = 'X' x $len;
   }
}
# Count number of Xs to get coverage
my $coverage = ($cover_seq=~tr/X//)/length($cover_seq) * 100;