visualize CD-HIT output file
1
0
Entering edit mode
10 weeks ago
m90 ▴ 30

Hello there ,

It is my first time to use CD-HIT tool for clustering , so my output file like below, I'm wondering if any script or tool I can use it in linux to see output as graph?

>Cluster 0
0       15679nt, >SpecA_Contig35475... at +/99.99%
1       15436nt, >SpecA_Contig35476... at +/99.62%
2       15764nt, >SpecB_Contig18540... *
3       15438nt, >SpecA_Contig39392... at +/99.69%
4       15679nt, >SpecC_comp263440_c8_seq4... at -/99.99%
>Cluster 1
0       15684nt, >SpecC_SB1234_Contig35474... at +/99.98%
1       15685nt, >SpecC_Contig11682... *
>Cluster 2
0       15684nt, >SpecA_comp263440_c8_seq3... at -/99.98%
1       15672nt, >SpecB_comp263440_c8_seq5... at -/99.97%

CD-HIT visualization • 210 views
0
Entering edit mode

Please use the formatting bar (especially the code option) to present your post better. You can use backticks for inline code (text becomes text), or select a chunk of text and use the highlighted button to format it as a code block. If your code has long lines with a single command, break those lines into multiple lines with proper escape sequences so they're easier to read and still run when copy-pasted. I've done it for you this time.

0
Entering edit mode
10 weeks ago
Mensur Dlakic ★ 15k

What type of graph do you have in mind? Something showing the number of clusters? Average number of cluster members? Average length of cluster sequences?

Whatever it is, you may want to start from this script, which will convert the CD-HIT output into a clustering solution. It should be easier to create a graph of any kind from it.

https://github.com/jrjhealey/bioinfo-tools/blob/master/ParseCDHIT.py