Protein sequence analysis
3
0
Entering edit mode
8.9 years ago

Hi,

I have a multiple protein sequence which looks like -

>gnl|TC-DB|198412035 1.A.70.1.3 PREDICTED: hypothetical protein [Ciona intestinalis]
MDQKIVFILLLVVLLVSQATADDGWVRTGLAVARLVVGRRRRRWNEANGLEKLSSDAEETLSAAEMEEVMQKIMDHQ
>gnl|TC-DB|198415263 1.A.70.1.2 PREDICTED: hypothetical protein [Ciona intestinalis]
MDRKIAFAFLFVILQVTTVSAGWGSLLRVETKAILGTLALRRRTWNENKASQQITPEMEEKLDAEMEKLMQQLAEDQQ
>gnl|TC-DB|210060745 2.A.39.3.6 Chain A, Structure Of Mhp1, A Nucleobase-Cation-Symport-1 Family Transporter
MNSTPIEEARSLLNPSNAPTRYAERSVGPFSLAAIWFAMAIQVAIFIAAGQMTSSFQVWQVIVAIAAGCTIAVILLFFTQ
SAAIRWGINFTVAARMPFGIRGSLIPITLKALLSLFWFGFQTWLGALALDEITRLLTGFTNLPLWIVIFGAIQVVTTFYG
ITFIRWMNVFASPVLLAMGVYMVYLMLDGADVSLGEVMSMGGENPGMPFSTAIMIFVGGWIAVVVSIHDIVKECKVDP
SREGQTKADARYATAQWLGMVPASIIFGFIGAASMVLVGEWNPVIAITEVVGGVSIPMAILFQVFVLLATWSTNPAANL
SPAYTLCSTFPRVFTFKTGVIVSAVVGLLMMPWQFAGVLNTFLNLLASALGPLAGIMISDYFLVRRRRISLHDLYRTKGI
>gnl|TC-DB|58177374 1.A.2.2.2 Chain B, Intermediate Gating Structure 2 Of The Inwardly Rectifying K+ Channel Kirbac3.1
MTGGMKPPARKPRILNSDGSSNITRLGLEKRGWLDDHYHDLLTVSWPVFITLITGLYLVTNALFALAYLACGDVIENARP
GSFTDAFFFSVQTMATIGYGKLIPIGPLANTLVTLEALCGMLGLAVAASLIYARFTRPTAGVLFSSRMVISDFEGKPTLM
MRLANLRIEQIIEADVHLVLVRSEISQEGMVFRRFHDLTLTRSRSPIFSLSWTVMHPIDHHSPIYGETDETLRNSHSEFL
VLFTGHHEAFAQNVHARHAYSCDEIIWGGHFVDVFTTLPDGRRALDLGKFHEIAQHHHHHH
>gnl|TC-DB|A0CIB0 1.A.17.3.1 Chromosome undetermined scaffold_19, whole genome shotgun sequence OS=Paramecium tetraurelia GN=GSPATT00007662001 PE=4 SV=1
MDDQNQPILQEQPKPKQKKPLLNTKMVKKQKMQNKKEENLREILNFYTNQVDARKFLQKMKAVVDSNQQEKKYQDD
NEYNEMQDIYEDYNMGDLVIVFPNPDADGVKNPPITYKEAEKFYDETLAQKNDDVSESQLKKEKEAFLLAFLMLDDYQ
..
..
..

and so on... (almost around 6000 sequences)

I need to do functional annotation of it, how can I do that?

Best!

Shashank

alignment blast • 2.4k views
ADD COMMENT
3
Entering edit mode
8.9 years ago
cdsouthan ★ 1.9k

Start with http://www.ebi.ac.uk/Tools/picr/ FASTA to map to UniProt IDs. Then use the http://www.uniprot.org/ interface to slice and dice (e.g. by GO or anything you like). For GO you might also consider http://www.pantherdb.org/ for additional options (including working between UniProt and PANTHER)

Note this answer is for analysis of function, not annotation (sensu adding this into the records). It also assumes these are "knowns" at least somewhere in UniProt. If these include a proportion of novel sequences you have different problem for which others may suggest solutions

ADD COMMENT
2
Entering edit mode
8.9 years ago
Juke34 8.5k

There is also the interproscan possibility. Look at my previous answer here: Get Associated GO terms to a transcript

You will get a tab delimited output file.

ADD COMMENT
1
Entering edit mode
8.9 years ago

You can also use either the free or the pro version of Blast2Go for doing this task. Having proteins as query, use BlastP into Blat2Go, and you will be ending with a very convenient table containing the Blast hits, the E and cover numbers, the GO, EC Number, InterPro domains and even the KEGG pathways associated to those proteins having similarity to your proteins. Then you can get statistics, doing Fisher enrichment tests, and even getting graphics of your searches in a very easy and convenient way

The Pro version will allow you to work faster because it is using cloud and privileged facilities, and also will allow you to work with different algorithms of your BlastP, not just the default one

ADD COMMENT
0
Entering edit mode

Thank you.

I think Blast2Go works for me.

ADD REPLY

Login before adding your answer.

Traffic: 2807 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6