clustering sequence FASTA GSS, EST, Transcripts
0
0
Entering edit mode
4.7 years ago
Annie • 0

I want to do clustering (k-means) and redundancy removal of my FASTA sequences which are mainly GSS, EST and assembled transcripts, to create a reference set for my short query sequences. My short query sequences can target either DNA or RNA. So I need some expert guidance. Also should I convert lower case base sequences into upper case for doing this task. Any suggestion would be highly appreciated.

genome assembly sequence next-gen alignment • 959 views
ADD COMMENT
1
Entering edit mode

You should also look at CD-HIT which is specifically tailored for this type of application and has specific subprograms.

ADD REPLY
0
Entering edit mode

Thanks for your answer genomax, but I have found uclust to be better than CD-HIT

ADD REPLY
0
Entering edit mode
ADD REPLY

Login before adding your answer.

Traffic: 1386 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6