Clustering Of Ncrna
3
1
Entering edit mode
11.1 years ago

I want to cluster a set of ncRNAs in human into different families based on sequence similarity. How should I do it and which software might be suitable to to it? Thanks very much!

• 2.3k views
ADD COMMENT
2
Entering edit mode
11.1 years ago
Ryan Thompson ★ 3.6k

You might want to map your ncRNAs to the genome and then cluster based on overlap, using something like blockbuster.

ADD COMMENT
0
Entering edit mode

Thanks! Actually what I meant was that I wanted to cluster RNA into families. Sorry for not saying clearly.

ADD REPLY
2
Entering edit mode
11.1 years ago
Paul Gardner ▴ 190

I'd build HMMs/CMs from alignments of the RNAs. Then iteratively add similar sequences to the largest clusters until you're done.

E.g. run hmmbuild -> hmmsearch -> hmmalign -> (predict a secondary structure w/ e.g. RNAalifold) -> make a stockholm alignment w/ secondary structure annotation -> cmbuild -> cmsearch -> cmalign. Repeat last 3 stages until convergence. Repeat with unaligned sequences.

ADD COMMENT
0
Entering edit mode

Thanks very much! So if I have a lot of RNA sequences and do not know which families they should belong to, how can I build HMM? Thanks!

ADD REPLY
0
Entering edit mode

Actually my question is that what sequences I should choose to start making alignments and building HMM? Can I use another kind of software like CDhit to get a clustering result initially and then use the largest cluster to build the HMM? Thanks!

ADD REPLY
1
Entering edit mode
11.1 years ago
Neilfws 49k

In general, the answer to "how do I cluster sequences?" is CD-HIT.

In this case, specifically CD-HIT-EST. From the applications page:

CD-HIT-EST has been used in clustering many types of sequences such as Expressed Sequence Tags (ESTs), MicroRNAs (miRNAs) (RNA, 2007 13:170-187), oligonucleotide probes (Bioinformatics, 2007 23:1195), 16S rRNA sequences (Nature, 2009, 457:480).

ADD COMMENT

Login before adding your answer.

Traffic: 1988 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6