Visualise CD-HIT clusters?
0
0
Entering edit mode
8 months ago
jamie.pike ▴ 60

I have recently clustered a set of proteins using CD-HIT and I was wondering if anyone could recommend a nice way to visualise the clusters (there are 61 clusters in total)?

Example of clusters from CD-HIT:

>Cluster 0
0   574aa, >GCA_000350365.1_Foc4_1.0_B2_genomic.fna_Candidate_Sequence_39024-44226_26... *
>Cluster 1
0   401aa, >GCA_000260195.2_FO_II5_V1_genomic.fna_Candidate_Sequence_g6.t1... *
1   108aa, >GCA_000260195.2_FO_II5_V1_genomic.fna_Candidate_Sequence_59093-64307_60... at 93.52%
2   401aa, >GCA_000350365.1_Foc4_1.0_B2_genomic.fna_Candidate_Sequence_g5.t1... at 100.00%
3   108aa, >GCA_000350365.1_Foc4_1.0_B2_genomic.fna_Candidate_Sequence_16327-21541_28... at 93.52%
4   401aa, >GCA_001696625.1_C1HIR_9889_genomic.fna_Candidate_Sequence_g20.t1... at 99.75%
5   108aa, >GCA_001696625.1_C1HIR_9889_genomic.fna_Candidate_Sequence_25103-30317_60... at 92.59%
6   401aa, >GCA_007994515.1_UK0001_genomic.fna_Candidate_Sequence_g13.t1... at 100.00%
7   108aa, >GCA_007994515.1_UK0001_genomic.fna_Candidate_Sequence_2796127-2801341_60... at 93.52%
8   401aa, >GWHAASU00000000_FocTR4_58.genomic.fna_Candidate_Sequence_g6.t1... at 100.00%
9   108aa, >GWHAASU00000000_FocTR4_58.genomic.fna_Candidate_Sequence_5739325-5744539_60... at 93.52%
>Cluster 2
0   373aa, >GCA_005930515.1_160527_genomic.fna_Candidate_Sequence_g6.t1... *
>Cluster 3
0   371aa, >GCA_000350365.1_Foc4_1.0_B2_genomic.fna_Candidate_Sequence_g2.t1... *
1   79aa, >GCA_000350365.1_Foc4_1.0_B2_genomic.fna_Candidate_Sequence_26225-31435_21... at 98.73%
2   30aa, >GCA_001696625.1_C1HIR_9889_genomic.fna_Candidate_Sequence_760-3765_21... at 100.00%
3   371aa, >GCA_007994515.1_UK0001_genomic.fna_Candidate_Sequence_g18.t1... at 99.73%
4   79aa, >GCA_007994515.1_UK0001_genomic.fna_Candidate_Sequence_2897725-2902935_56... at 97.47%
5   371aa, >GWHAASU00000000_FocTR4_58.genomic.fna_Candidate_Sequence_g13.t1... at 99.73%
6   79aa, >GWHAASU00000000_FocTR4_58.genomic.fna_Candidate_Sequence_5872866-5878075_56... at 97.47%

clustering data visualisation CD-HIT • 365 views
0
Entering edit mode

Visualize in what way? You could build multiple sequence alignments for each of these clusters.

0
Entering edit mode

I was thinking something along the lines of a heatmap that is binary, if that makes sense? The protein clusters are from 9 genomes, 5 from one strain, 4 from another. I was hoping to try and create something that demonstrates a proteins presence in one strain and absence in another. I wasn't sure if there was something out there that would already be capable of doing that. If not I could potentially pick some clusters for MSA and present that. I suppose theres always a simple table indicating presence or absence. Just thought I'd see if anyone had any ideas