Question: clustering sequence FASTA GSS, EST, Transcripts
gravatar for Annie
12 months ago by
India, ICGEB
Annie0 wrote:

I want to do clustering (k-means) and redundancy removal of my FASTA sequences which are mainly GSS, EST and assembled transcripts, to create a reference set for my short query sequences. My short query sequences can target either DNA or RNA. So I need some expert guidance. Also should I convert lower case base sequences into upper case for doing this task. Any suggestion would be highly appreciated.

ADD COMMENTlink modified 11 months ago by Biostar ♦♦ 20 • written 12 months ago by Annie0

You should also look at CD-HIT which is specifically tailored for this type of application and has specific subprograms.

ADD REPLYlink written 12 months ago by genomax87k

Thanks for your answer genomax, but I have found uclust to be better than CD-HIT

ADD REPLYlink written 12 months ago by Annie0

You might look at from BBTools.

ADD REPLYlink written 12 months ago by jean.elbers1.4k
Please log in to add an answer.


Use of this site constitutes acceptance of our User Agreement and Privacy Policy.
Powered by Biostar version 2.3.0
Traffic: 687 users visited in the last hour