Question

Protein sequence analysis

0

Entering edit mode

8.9 years ago

bioinformaticssrm2011 ▴ 90

Hi,

I have a multiple protein sequence which looks like -

>gnl|TC-DB|198412035 1.A.70.1.3 PREDICTED: hypothetical protein [Ciona intestinalis]
MDQKIVFILLLVVLLVSQATADDGWVRTGLAVARLVVGRRRRRWNEANGLEKLSSDAEETLSAAEMEEVMQKIMDHQ
>gnl|TC-DB|198415263 1.A.70.1.2 PREDICTED: hypothetical protein [Ciona intestinalis]
MDRKIAFAFLFVILQVTTVSAGWGSLLRVETKAILGTLALRRRTWNENKASQQITPEMEEKLDAEMEKLMQQLAEDQQ
>gnl|TC-DB|210060745 2.A.39.3.6 Chain A, Structure Of Mhp1, A Nucleobase-Cation-Symport-1 Family Transporter
MNSTPIEEARSLLNPSNAPTRYAERSVGPFSLAAIWFAMAIQVAIFIAAGQMTSSFQVWQVIVAIAAGCTIAVILLFFTQ
SAAIRWGINFTVAARMPFGIRGSLIPITLKALLSLFWFGFQTWLGALALDEITRLLTGFTNLPLWIVIFGAIQVVTTFYG
ITFIRWMNVFASPVLLAMGVYMVYLMLDGADVSLGEVMSMGGENPGMPFSTAIMIFVGGWIAVVVSIHDIVKECKVDP
SREGQTKADARYATAQWLGMVPASIIFGFIGAASMVLVGEWNPVIAITEVVGGVSIPMAILFQVFVLLATWSTNPAANL
SPAYTLCSTFPRVFTFKTGVIVSAVVGLLMMPWQFAGVLNTFLNLLASALGPLAGIMISDYFLVRRRRISLHDLYRTKGI
>gnl|TC-DB|58177374 1.A.2.2.2 Chain B, Intermediate Gating Structure 2 Of The Inwardly Rectifying K+ Channel Kirbac3.1
MTGGMKPPARKPRILNSDGSSNITRLGLEKRGWLDDHYHDLLTVSWPVFITLITGLYLVTNALFALAYLACGDVIENARP
GSFTDAFFFSVQTMATIGYGKLIPIGPLANTLVTLEALCGMLGLAVAASLIYARFTRPTAGVLFSSRMVISDFEGKPTLM
MRLANLRIEQIIEADVHLVLVRSEISQEGMVFRRFHDLTLTRSRSPIFSLSWTVMHPIDHHSPIYGETDETLRNSHSEFL
VLFTGHHEAFAQNVHARHAYSCDEIIWGGHFVDVFTTLPDGRRALDLGKFHEIAQHHHHHH
>gnl|TC-DB|A0CIB0 1.A.17.3.1 Chromosome undetermined scaffold_19, whole genome shotgun sequence OS=Paramecium tetraurelia GN=GSPATT00007662001 PE=4 SV=1
MDDQNQPILQEQPKPKQKKPLLNTKMVKKQKMQNKKEENLREILNFYTNQVDARKFLQKMKAVVDSNQQEKKYQDD
NEYNEMQDIYEDYNMGDLVIVFPNPDADGVKNPPITYKEAEKFYDETLAQKNDDVSESQLKKEKEAFLLAFLMLDDYQ
..
..
..

and so on... (almost around 6000 sequences)

I need to do functional annotation of it, how can I do that?

Best!

Shashank

alignment blast • 2.4k views

ADD COMMENT • link updated 15 months ago by Ram 43k • written 8.9 years ago by bioinformaticssrm2011 ▴ 90

Ram · Answer 1 · 2015-05-22

Start with http://www.ebi.ac.uk/Tools/picr/ FASTA to map to UniProt IDs. Then use the http://www.uniprot.org/ interface to slice and dice (e.g. by GO or anything you like). For GO you might also consider http://www.pantherdb.org/ for additional options (including working between UniProt and PANTHER)

Note this answer is for analysis of function, not annotation (sensu adding this into the records). It also assumes these are "knowns" at least somewhere in UniProt. If these include a proportion of novel sequences you have different problem for which others may suggest solutions

Ram · Answer 2 · 2015-05-22

2

Entering edit mode

8.9 years ago

Juke34 8.5k

There is also the interproscan possibility. Look at my previous answer here: Get Associated GO terms to a transcript

You will get a tab delimited output file.

ADD COMMENT • link updated 15 months ago by Ram 43k • written 8.9 years ago by Juke34 8.5k

Ram · Answer 3 · 2015-05-22

1

Entering edit mode

8.9 years ago

Antonio R. Franco ★ 5.1k

You can also use either the free or the pro version of Blast2Go for doing this task. Having proteins as query, use BlastP into Blat2Go, and you will be ending with a very convenient table containing the Blast hits, the E and cover numbers, the GO, EC Number, InterPro domains and even the KEGG pathways associated to those proteins having similarity to your proteins. Then you can get statistics, doing Fisher enrichment tests, and even getting graphics of your searches in a very easy and convenient way

The Pro version will allow you to work faster because it is using cloud and privileged facilities, and also will allow you to work with different algorithms of your BlastP, not just the default one

ADD COMMENT • link updated 15 months ago by Ram 43k • written 8.9 years ago by Antonio R. Franco ★ 5.1k

0

Entering edit mode

Thank you.

I think Blast2Go works for me.

ADD REPLY • link updated 15 months ago by Ram 43k • written 8.9 years ago by bioinformaticssrm2011 ▴ 90