Question

How to find the overlapping peptide from a list.

0

Entering edit mode

4.5 years ago

arriyaz.nstu ▴ 30

I have the following sequence;

IRCIGVSNRDFVEGMSGGTWVDVVLEHGGCVTVMAQDKPTVDIELVTTTVSNMAEVRSYCYEASISDMASDSRCPTQGEAYLDKQSDTQYVCKRTLVDRGWG NGCGLFGKGSLVTCAKFACSKKMTGKSIQPENLEYRIMLSVHGSQHSGMI

I used three tools/methods from the Immune Epitope Database (IEDB) to predict antigenic peptides from this sequence. So, each of the method generated a list of peptides. Thus I have three lists for the given sequence. The principle is; I will find the peptide that is common or overlapping in all three lists.

Here in the linked table, I found a peptide DSRCPTQ common and overlapping in all three lists. But I did it manually.

Is it possible to find peptide in such a way through UNIX command in ubuntu terminal? If yes, then would you please suggest me the way?

Predicted Peptides

sequence • 835 views

ADD COMMENT • link updated 4.5 years ago by Mensur Dlakic ★ 27k • written 4.5 years ago by arriyaz.nstu ▴ 30

score 3 · Accepted Answer · 2019-10-10

You may get some useful ideas from this post. For it to work, you would need to concatenate all peptides from the same predictions into one string, but with space characters separating them:

pept_combo1 GVSNRDFVEGMSGGTWVDVVLEHGGCVTVMAQDKPTVDIELVTTTVSNMAEVRSYCYEASISDMASDSRCPTQGEAYLDKQSDTQYVCKRTLVDRGWGNGCGLFGKGSLVTCAKFACSKKMTGKSIQPENLEYRIMLSVHGSQHS
pept_combo2 MAQDKPTV MASDSRCPTQGEAYLDKQSDT KSIQPENLEYR
pept_combo3 WVDVVLEHGGCVTVM KPTVDIELVTTT VRSYCYEAS DSRCPTQ TQYVCKRTLVDR KGSLVTCAKFACSK RIMLSVHGSQ

For this to work with the accepted python script in a post I referenced above, you would skip the pept_comboX part and only paste the sequences.

def long_substr(data):
    substr = ''
    if len(data) > 1 and len(data[0]) > 0:
        for i in range(len(data[0])):
            for j in range(len(data[0])-i+1):
                if j > len(substr) and is_substr(data[0][i:i+j], data):
                    substr = data[0][i:i+j]
    return substr

def is_substr(find, data):
    if len(data) < 1 and len(find) < 1:
        return False
    for i in range(len(data)):
        if find not in data[i]:
            return False
    return True

print long_substr(['GVSNRDFVEGMSGGTWVDVVLEHGGCVTVMAQDKPTVDIELVTTTVSNMAEVRSYCYEASISDMASDSRCPTQGEAYLDKQSDTQYVCKRTLVDRGWGNGCGLFGKGSLVTCAKFACSKKMTGKSIQPENLEYRIMLSVHGSQHS',
                   'MAQDKPTV MASDSRCPTQGEAYLDKQSDT KSIQPENLEYR',
                   'WVDVVLEHGGCVTVM KPTVDIELVTTT VRSYCYEAS DSRCPTQ TQYVCKRTLVDR KGSLVTCAKFACSK RIMLSVHGSQ'])

This script will print out DSRCPTQ.