Question: How To Extract Conserved Regions From A Large Number Of Sequences
0
gravatar for Ali Habib
6.9 years ago by
Ali Habib10
Ali Habib10 wrote:

I have 1061 sequences , I want to extract the conserved regions,

Most of the servers accept only a few sequences , is there a site can do that. I can use bioedit but it's accuracy is not that good.

it's protien sequences of GPCR , for human ,

I want to do analysis on it like the conserved region, the 3D structure , and so on

it's my first task in Bioinformatics diploma to analyze a protien, get the conserved regions, do alignment , the 3D, the unkonw regions, the folding and so on , plus submit the output results

sequence protein • 7.4k views
ADD COMMENTlink modified 6.9 years ago by Medhat8.6k • written 6.9 years ago by Ali Habib10
3

Are they protein sequences? Can you explain which services you have tried? And explain a bit more about what you are trying to do?

ADD REPLYlink written 6.9 years ago by Niallhaslam2.3k
2

Are there 1061 different proteins? Or one protein with 1061 orthologs?

ADD REPLYlink written 6.9 years ago by Zev.Kronenberg11k
1

one protein with 1061 orthologs

ADD REPLYlink written 6.9 years ago by Ali Habib10
2

As Niall says above, we need to know more about what would be an interesting question for you to address from this dataset, before we can make suggestions, offer help, etc. "analysing the 3D structure" is too general - there are many many different kinds of "analyses" one could think of doing with this data on this protein family, that are associated with its 3D structure

ADD REPLYlink written 6.9 years ago by aidan-budd1.9k
3
gravatar for aidan-budd
6.9 years ago by
aidan-budd1.9k
Germany
aidan-budd1.9k wrote:

There is a webserver for GBLOCKS that you can upload protein sequences to and get out conserved blocks from. This might be useful for you. You can also relatively easily install GBLOCKS locally on your machine and run it there, if you have too many sequences for the server to accept.

However, as Niall mentions above, it's hard to know what to recommend you try out, without a better idea of what you want to do and why.

ADD COMMENTlink written 6.9 years ago by aidan-budd1.9k
3
gravatar for Medhat
6.9 years ago by
Medhat8.6k
Texas
Medhat8.6k wrote:

SA Ali can you please check those links below

Rapid detection of conserved regions in protein sequences using wavelets.

List of sequence alignment software

Identification and Characterization of Multi-Species Conserved Sequences

ADD COMMENTlink modified 6.9 years ago • written 6.9 years ago by Medhat8.6k

are there any site for detection of this regions ,

ADD REPLYlink written 6.9 years ago by Ali Habib10

Hi Ali,

What makes this hard to answer, is that, yes, there are different tools out there that will segregate columns in an alignment between those that are better conserved, those that are less conserved. However, a major part of the reason why there are (many) different tools to do this kind of thing, is that they are designed with different purposes in mind, and thus "choose" their more/less conserved columns/regions in different ways.

Thus, without knowing why you want to put this selection of conserved regions - i.e. what are the statements you would like to make about (presumably the structure/function) this protein family (linking statements/inference of particular kinds of structure/function to different regions, perhaps) after identifying these regions, that you can't make without having done this analysis, I don't know which direction to point you in.

Working out/dissecting/identifying this is a crucial element of any bioinformatic analyses you do - and it's something that beginners to the field very often have trouble with. So please don't worry that you may be in this situation, it's normal!

My recommendation for you, is that you take some time to identify this i.e. identify the statements you're hoping to be able to make as a result of running your analysis e.g. "Due to their conservation, physicochemical properties, and the results of XXX prediction tool, these regions of the query protein are the ones most likely to be the transmembrane regions of the protein family" - once you've done that, let us know, and we may be better able to help you.

ADD REPLYlink written 6.9 years ago by aidan-budd1.9k
1
gravatar for Pappu
6.9 years ago by
Pappu1.9k
Pappu1.9k wrote:

There is one conserved residue in each helix of a GPCR like DRY, NPxxY motif etc. Look at this review for details: http://www.ncbi.nlm.nih.gov/pmc/articles/PMC3343417/

ADD COMMENTlink written 6.9 years ago by Pappu1.9k

I need to do all of that by myself , it's my level1 exam in Bioinformatics Diploma

ADD REPLYlink written 6.9 years ago by Ali Habib10

Just use ClustalW/T-Coffee/Muscle to align those sequences and then manually edit it if the conservered sequences in the literature are not in the same columns.

ADD REPLYlink written 6.9 years ago by Pappu1.9k

I used MEga 5 , because I have 1061 sequence, I was thinking about doing it from scratch

ADD REPLYlink written 6.9 years ago by Ali Habib10

What you are trying to do has been already published (see the review I posted before). I am not sure if your university accepts repetitive work which was published already. According to my experience, what lot of diploma students do is to make homology model of a GPCR(s) with unknown crystal structure, dock few ligands and write thesis on the binding site.

ADD REPLYlink written 6.9 years ago by Pappu1.9k

No it's just an introductory course about bioinformatics , to be familiar with basic tools

ADD REPLYlink written 6.9 years ago by Ali Habib10
Please log in to add an answer.

Help
Access

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.
Powered by Biostar version 2.3.0
Traffic: 2070 users visited in the last hour