visualize CD-HIT output file
10 weeks ago
m90 ▴ 30

Hello there ,

It is my first time to use CD-HIT tool for clustering , so my output file like below, I'm wondering if any script or tool I can use it in linux to see output as graph?

>Cluster 0
0       15679nt, >SpecA_Contig35475... at +/99.99%
1       15436nt, >SpecA_Contig35476... at +/99.62%
2       15764nt, >SpecB_Contig18540... *
3       15438nt, >SpecA_Contig39392... at +/99.69%
4       15679nt, >SpecC_comp263440_c8_seq4... at -/99.99%
>Cluster 1
0       15684nt, >SpecC_SB1234_Contig35474... at +/99.98%
1       15685nt, >SpecC_Contig11682... *
>Cluster 2
0       15684nt, >SpecA_comp263440_c8_seq3... at -/99.98%
1       15672nt, >SpecB_comp263440_c8_seq5... at -/99.97%

10 weeks ago
Mensur Dlakic ★ 15k

What type of graph do you have in mind? Something showing the number of clusters? Average number of cluster members? Average length of cluster sequences?

Whatever it is, you may want to start from this script, which will convert the CD-HIT output into a clustering solution. It should be easier to create a graph of any kind from it.

https://github.com/jrjhealey/bioinfo-tools/blob/master/ParseCDHIT.py