Question

Building A Consensus Sequence From A Set Of Sequences

2

Entering edit mode

12.8 years ago

Rishika Sengupta ▴ 40

I have a list of around 50 pdb files/fasta sequences (they do not belong to any family). I badly need to build up 10-15 consensus sequences from them representing sets of PDB files. I have used the web servers ClustalW and Consensus could not really understand the results. Please help.

Rishika CARLBio group

multiple clustalw consensus bioperl • 15k views

ADD COMMENT • link updated 12.8 years ago by brentp 24k • written 12.8 years ago by Rishika Sengupta ▴ 40

score 3 · Answer 1 · 2011-07-28

3

Entering edit mode

12.8 years ago

Pals ★ 1.3k

I have once found the program PAGAN to be useful in this sort of cases. You could go through the Manuscript first. Also Codoncode aligner can address your problem.

ADD COMMENT • link 12.8 years ago by Pals ★ 1.3k

0

Entering edit mode

Thanks but cant install PAGAN

ADD REPLY • link 12.8 years ago by Rishika Sengupta ▴ 40

0

Entering edit mode

Probably because it is targeted to debian based linux environment. It would be really difficult if not impossible if you are using other linux distributions.

ADD REPLY • link 12.8 years ago by Pals ★ 1.3k

0

Entering edit mode

You need libboost libraries installed: sudo apt-get install libboost-dev libboost-program-options1.42-dev libboost-regex1.42-dev

ADD REPLY • link 12.8 years ago by 2184687-1231-83- ★ 5.1k

0

Entering edit mode

If you have problems installing PAGAN, please contact the author and ask for help. You can find the contact details on the program web site.

PAGAN requires two Boost packages (available for all platforms, including OSX and Windows) but shouldn't then compile fine.

ADD REPLY • link 12.8 years ago by Ari ▴ 90

0

Entering edit mode

If you have problems installing PAGAN, please contact the author and ask for help. You can find the contact details on the program web site. PAGAN requires two Boost packages (available for all platforms, including OSX and Windows) but should then compile fine.

ADD REPLY • link 12.8 years ago by Ari ▴ 90

score 1 · Answer 2 · 2011-07-28

1

Entering edit mode

12.8 years ago

Neilfws 49k

If what you want to do is cluster the sequences into groups and choose a representative of each group, look no further than CD-HIT. It has a web server too, if you don't want to run it locally.

ADD COMMENT • link 12.8 years ago by Neilfws 49k

0

Entering edit mode

I have just now used the CD-HIT web server. But I want to input my sequences in fasta format, cluster the sequences into groups and choose a representative of each group, as you said correctly. Can I do that with the help of CD-HIT? Please reply. Thank you so much for your help.

ADD REPLY • link 12.8 years ago by Rishika Sengupta ▴ 40

0

Entering edit mode

Well yes, you can - as I said in the answer.

ADD REPLY • link 12.8 years ago by Neilfws 49k

0

Entering edit mode

Well yes, you can - as I said in the answer. I've used the standalone program to do exactly that; I have not used the server, but assume it can do the same.

ADD REPLY • link 12.8 years ago by Neilfws 49k

0

Entering edit mode

Thanks a tonne....could do that after installing the G++ compiler. :):)

ADD REPLY • link 12.8 years ago by Rishika Sengupta ▴ 40

Ram · Answer 3 · 2011-07-28

1

Entering edit mode

12.8 years ago

brentp 24k

After generating your clusters, look here for how to generate a consensus sequence.

Every time someone asks about consensus sequences, I recommend motility. It has a C++ and a python interface.

ADD COMMENT • link updated 4.6 years ago by Ram 43k • written 12.8 years ago by brentp 24k

0

Entering edit mode

Will this program consider protein sequences also?

ADD REPLY • link 12.8 years ago by Pals ★ 1.3k

0

Entering edit mode

But will this program consider protein sequence also?

ADD REPLY • link 12.8 years ago by Pals ★ 1.3k

0

Entering edit mode

The CD-HIT worked out....

ADD REPLY • link 12.8 years ago by Rishika Sengupta ▴ 40

0

Entering edit mode

Thanks for the response.

ADD REPLY • link 12.8 years ago by Rishika Sengupta ▴ 40

0

Entering edit mode

@Kisun, good point. You'd have to convert to nucleotide sequence to use motility.

ADD REPLY • link 12.8 years ago by brentp 24k

score 0 · Answer 4 · 2011-07-28

0

Entering edit mode

12.8 years ago

Assa Yeroslaviz ★ 1.8k

How about doing it with R?

the Biostring package is very helpfull and easy to understand. have a look at the vignette of Biostring

ADD COMMENT • link 12.8 years ago by Assa Yeroslaviz ★ 1.8k