Question

How To Extract Conserved Regions From A Large Number Of Sequences

0

Entering edit mode

11.3 years ago

Ali Habib ▴ 10

I have 1061 sequences , I want to extract the conserved regions,

Most of the servers accept only a few sequences , is there a site can do that. I can use bioedit but it's accuracy is not that good.

it's protien sequences of GPCR , for human ,

I want to do analysis on it like the conserved region, the 3D structure , and so on

it's my first task in Bioinformatics diploma to analyze a protien, get the conserved regions, do alignment , the 3D, the unkonw regions, the folding and so on , plus submit the output results

sequence protein • 11k views

ADD COMMENT • link updated 11.2 years ago by Medhat 9.7k • written 11.3 years ago by Ali Habib ▴ 10

3

Entering edit mode

Are they protein sequences? Can you explain which services you have tried? And explain a bit more about what you are trying to do?

ADD REPLY • link 11.3 years ago by Niallhaslam 2.3k

2

Entering edit mode

Are there 1061 different proteins? Or one protein with 1061 orthologs?

ADD REPLY • link 11.3 years ago by Zev.Kronenberg 12k

1

Entering edit mode

one protein with 1061 orthologs

ADD REPLY • link 11.3 years ago by Ali Habib ▴ 10

2

Entering edit mode

As Niall says above, we need to know more about what would be an interesting question for you to address from this dataset, before we can make suggestions, offer help, etc. "analysing the 3D structure" is too general - there are many many different kinds of "analyses" one could think of doing with this data on this protein family, that are associated with its 3D structure

ADD REPLY • link 11.3 years ago by aidan-budd 1.9k

score 3 · Answer 1 · 2013-01-18

There is a webserver for GBLOCKS that you can upload protein sequences to and get out conserved blocks from. This might be useful for you. You can also relatively easily install GBLOCKS locally on your machine and run it there, if you have too many sequences for the server to accept.

However, as Niall mentions above, it's hard to know what to recommend you try out, without a better idea of what you want to do and why.

score 3 · Answer 2 · 2013-01-20

3

Entering edit mode

11.2 years ago

Medhat 9.7k

SA Ali can you please check those links below

Rapid detection of conserved regions in protein sequences using wavelets.

List of sequence alignment software

Identification and Characterization of Multi-Species Conserved Sequences

ADD COMMENT • link 11.2 years ago by Medhat 9.7k

0

Entering edit mode

are there any site for detection of this regions ,

ADD REPLY • link 11.2 years ago by Ali Habib ▴ 10

0

Entering edit mode

Hi Ali,

What makes this hard to answer, is that, yes, there are different tools out there that will segregate columns in an alignment between those that are better conserved, those that are less conserved. However, a major part of the reason why there are (many) different tools to do this kind of thing, is that they are designed with different purposes in mind, and thus "choose" their more/less conserved columns/regions in different ways.

Thus, without knowing why you want to put this selection of conserved regions - i.e. what are the statements you would like to make about (presumably the structure/function) this protein family (linking statements/inference of particular kinds of structure/function to different regions, perhaps) after identifying these regions, that you can't make without having done this analysis, I don't know which direction to point you in.

Working out/dissecting/identifying this is a crucial element of any bioinformatic analyses you do - and it's something that beginners to the field very often have trouble with. So please don't worry that you may be in this situation, it's normal!

My recommendation for you, is that you take some time to identify this i.e. identify the statements you're hoping to be able to make as a result of running your analysis e.g. "Due to their conservation, physicochemical properties, and the results of XXX prediction tool, these regions of the query protein are the ones most likely to be the transmembrane regions of the protein family" - once you've done that, let us know, and we may be better able to help you.

ADD REPLY • link 11.2 years ago by aidan-budd 1.9k

score 1 · Answer 3 · 2013-01-18

1

Entering edit mode

11.3 years ago

Pappu ★ 2.1k

There is one conserved residue in each helix of a GPCR like DRY, NPxxY motif etc. Look at this review for details: http://www.ncbi.nlm.nih.gov/pmc/articles/PMC3343417/

ADD COMMENT • link 11.3 years ago by Pappu ★ 2.1k

0

Entering edit mode

I need to do all of that by myself , it's my level1 exam in Bioinformatics Diploma

ADD REPLY • link 11.2 years ago by Ali Habib ▴ 10

0

Entering edit mode

Just use ClustalW/T-Coffee/Muscle to align those sequences and then manually edit it if the conservered sequences in the literature are not in the same columns.

ADD REPLY • link 11.2 years ago by Pappu ★ 2.1k

0

Entering edit mode

I used MEga 5 , because I have 1061 sequence, I was thinking about doing it from scratch

ADD REPLY • link 11.2 years ago by Ali Habib ▴ 10

0

Entering edit mode

What you are trying to do has been already published (see the review I posted before). I am not sure if your university accepts repetitive work which was published already. According to my experience, what lot of diploma students do is to make homology model of a GPCR(s) with unknown crystal structure, dock few ligands and write thesis on the binding site.

ADD REPLY • link 11.2 years ago by Pappu ★ 2.1k

0

Entering edit mode

No it's just an introductory course about bioinformatics , to be familiar with basic tools

ADD REPLY • link 11.2 years ago by Ali Habib ▴ 10