Question

Short protein sequence alignment

0

Entering edit mode

4.2 years ago

Bioinformatician_in_trouble ▴ 30

Hello all, I was wondering if there any tools available for doing multiple sequence alignment for N terminal residues (say just 20 residues). I tried Blast but it gives me alignment with only one among the two subject sequences provided. I had separated the two subject sequences using a comma (is that the correct way?). I know I can manually do it, especially the identity part but for similarity, I might have to open the amino acid table. So just wanted some insights on any tool or way to do this?

alignment sequence blast • 1.1k views

ADD COMMENT • link 4.2 years ago by Bioinformatician_in_trouble ▴ 30

0

Entering edit mode

Did you use the online blast? Could you provide more information about your dataset (a few lines from your file) and more details about how you used blast?

For multiple sequence alignment you can use MAFFT:

https://mafft.cbrc.jp/alignment/server/

https://blast.ncbi.nlm.nih.gov/Blast.cgi?CMD=Web&PAGE_TYPE=BlastDocs&DOC_TYPE=BlastHelp

ADD REPLY • link 4.2 years ago by Fatima ▴ 1000

0

Entering edit mode

Thanks, Fatima for replying!For example this is the query protein's residues: SDPLSMVGPSQGRSPSYAS and I want to know the identity and similarity of this query protein's residue with subject protein 1: VNTHAGGTGPEGCRPFAKF and subject protein 2: HLESDMFSSPLETDSMDPF Again, these are short and I can do it manually but wanted to know if there an insilico way to do it.

ADD REPLY • link 4.2 years ago by Bioinformatician_in_trouble ▴ 30

0

Entering edit mode

>query
SDPLSMVGPSQGRSPSYAS

>subject1
VNTHAGGTGPEGCRPFAKF 
>subject2
HLESDMFSSPLETDSMDPF

If your sequences were longer you could use blastp (Align two or more sequences option).

MAFFT output

id1             -------SDPLSMVGPSQGRSPSYAS
id3             HLESDMFSSPLETDSMD----PF---
id2             ------VNTHAGGTGPEGCR-PFAKF

MAFFT FASTA output

>id1
-------SDPLSMVGPSQGRSPSYAS
>id3
HLESDMFSSPLETDSMD----PF---
>id2
------VNTHAGGTGPEGCR-PFAKF

Then you can get the pairs that you are interested in and clean up the columns with gaps in both sequences:

>id1
SDPLSMVGPSQGRSPSYAS
>id2
VNTHAGGTGPEGCR-PFAKF


>id1
-------SDPLSMVGPSQGRSPSYAS
>id3
HLESDMFSSPLETDSMD----PF---

https://mafft.cbrc.jp/alignment/server/spool/_out.200213155522842eoKWxmY7tMldACkJ1fPvVlsfnormal.pir

Other tools: https://www.ebi.ac.uk/Tools/psa/

ADD REPLY • link 4.2 years ago by Fatima ▴ 1000

0

Entering edit mode

Fatima, I tried MAFFT with my original dataset but I am not sure how to interpret the results since there is no e value/identity/ similarity percentage given. For example; how can I interpret the output below:

>id1
SDPLSMVGPSQGRSPSYAS
>id2
VNTHAGGTGPEGCR-PFAKF

ADD REPLY • link 4.2 years ago by Bioinformatician_in_trouble ▴ 30

0

Entering edit mode

I'm not sure about pairwise alignments but for multiple sequence alignment you can use MAFFT and then guidance

Please see the output of guidance:

http://guidance.tau.ac.il/results/15816469509376/MSA.MAFFT.Guidance_res_pair_res.html

GUIDANCE alignment score: 0.306977

This article might help:

https://www.nature.com/articles/s41598-019-56499-4

Pairwise alignment tools:

https://www.ebi.ac.uk/Tools/psa/

ADD REPLY • link 4.2 years ago by Fatima ▴ 1000

0

Entering edit mode

Thank you Fatima for all your help and also introducing me to MAFFT and guidance :) I shall read more about these.

ADD REPLY • link 4.2 years ago by Bioinformatician_in_trouble ▴ 30