Question: clustering sequence FASTA GSS, EST, Transcripts
gravatar for Annie
7 weeks ago by
India, ICGEB
Annie0 wrote:

I want to do clustering (k-means) and redundancy removal of my FASTA sequences which are mainly GSS, EST and assembled transcripts, to create a reference set for my short query sequences. My short query sequences can target either DNA or RNA. So I need some expert guidance. Also should I convert lower case base sequences into upper case for doing this task. Any suggestion would be highly appreciated.

ADD COMMENTlink modified 4 weeks ago by Biostar ♦♦ 20 • written 7 weeks ago by Annie0

You should also look at CD-HIT which is specifically tailored for this type of application and has specific subprograms.

ADD REPLYlink written 7 weeks ago by genomax71k

Thanks for your answer genomax, but I have found uclust to be better than CD-HIT

ADD REPLYlink written 7 weeks ago by Annie0

You might look at from BBTools.

ADD REPLYlink written 7 weeks ago by jean.elbers1.3k
Please log in to add an answer.


Use of this site constitutes acceptance of our User Agreement and Privacy Policy.
Powered by Biostar version 2.3.0
Traffic: 1621 users visited in the last hour