Protein sequence analysis
7.7 years ago

Hi,

I have a multiple protein sequence which looks like -

>gnl|TC-DB|198412035 1.A.70.1.3 PREDICTED: hypothetical protein [Ciona intestinalis]
>gnl|TC-DB|198415263 1.A.70.1.2 PREDICTED: hypothetical protein [Ciona intestinalis]
MDRKIAFAFLFVILQVTTVSAGWGSLLRVETKAILGTLALRRRTWNENKASQQITPEMEEKLDAEMEKLMQQLAEDQQ
>gnl|TC-DB|210060745 2.A.39.3.6 Chain A, Structure Of Mhp1, A Nucleobase-Cation-Symport-1 Family Transporter
MNSTPIEEARSLLNPSNAPTRYAERSVGPFSLAAIWFAMAIQVAIFIAAGQMTSSFQVWQVIVAIAAGCTIAVILLFFTQ
SAAIRWGINFTVAARMPFGIRGSLIPITLKALLSLFWFGFQTWLGALALDEITRLLTGFTNLPLWIVIFGAIQVVTTFYG
SPAYTLCSTFPRVFTFKTGVIVSAVVGLLMMPWQFAGVLNTFLNLLASALGPLAGIMISDYFLVRRRRISLHDLYRTKGI
>gnl|TC-DB|58177374 1.A.2.2.2 Chain B, Intermediate Gating Structure 2 Of The Inwardly Rectifying K+ Channel Kirbac3.1
MTGGMKPPARKPRILNSDGSSNITRLGLEKRGWLDDHYHDLLTVSWPVFITLITGLYLVTNALFALAYLACGDVIENARP
GSFTDAFFFSVQTMATIGYGKLIPIGPLANTLVTLEALCGMLGLAVAASLIYARFTRPTAGVLFSSRMVISDFEGKPTLM
VLFTGHHEAFAQNVHARHAYSCDEIIWGGHFVDVFTTLPDGRRALDLGKFHEIAQHHHHHH
>gnl|TC-DB|A0CIB0 1.A.17.3.1 Chromosome undetermined scaffold_19, whole genome shotgun sequence OS=Paramecium tetraurelia GN=GSPATT00007662001 PE=4 SV=1
MDDQNQPILQEQPKPKQKKPLLNTKMVKKQKMQNKKEENLREILNFYTNQVDARKFLQKMKAVVDSNQQEKKYQDD
..
..
..


and so on... (almost around 6000 sequences)

I need to do functional annotation of it, how can I do that?

Best!

Shashank

7.7 years ago
cdsouthan ★ 1.9k

Start with http://www.ebi.ac.uk/Tools/picr/ FASTA to map to UniProt IDs. Then use the http://www.uniprot.org/ interface to slice and dice (e.g. by GO or anything you like). For GO you might also consider http://www.pantherdb.org/ for additional options (including working between UniProt and PANTHER)

Note this answer is for analysis of function, not annotation (sensu adding this into the records). It also assumes these are "knowns" at least somewhere in UniProt. If these include a proportion of novel sequences you have different problem for which others may suggest solutions

7.7 years ago
Juke34 7.5k

There is also the interproscan possibility. Look at my previous answer here: Get Associated GO terms to a transcript

You will get a tab delimited output file.

7.7 years ago

You can also use either the free or the pro version of Blast2Go for doing this task. Having proteins as query, use BlastP into Blat2Go, and you will be ending with a very convenient table containing the Blast hits, the E and cover numbers, the GO, EC Number, InterPro domains and even the KEGG pathways associated to those proteins having similarity to your proteins. Then you can get statistics, doing Fisher enrichment tests, and even getting graphics of your searches in a very easy and convenient way

The Pro version will allow you to work faster because it is using cloud and privileged facilities, and also will allow you to work with different algorithms of your BlastP, not just the default one

Thank you.

I think Blast2Go works for me.