Visualise CD-HIT clusters?
0
0
Entering edit mode
3.7 years ago
jamie.pike ▴ 80

I have recently clustered a set of proteins using CD-HIT and I was wondering if anyone could recommend a nice way to visualise the clusters (there are 61 clusters in total)?

Example of clusters from CD-HIT:

>Cluster 0
0   574aa, >GCA_000350365.1_Foc4_1.0_B2_genomic.fna_Candidate_Sequence_39024-44226_26... *
>Cluster 1
0   401aa, >GCA_000260195.2_FO_II5_V1_genomic.fna_Candidate_Sequence_g6.t1... *
1   108aa, >GCA_000260195.2_FO_II5_V1_genomic.fna_Candidate_Sequence_59093-64307_60... at 93.52%
2   401aa, >GCA_000350365.1_Foc4_1.0_B2_genomic.fna_Candidate_Sequence_g5.t1... at 100.00%
3   108aa, >GCA_000350365.1_Foc4_1.0_B2_genomic.fna_Candidate_Sequence_16327-21541_28... at 93.52%
4   401aa, >GCA_001696625.1_C1HIR_9889_genomic.fna_Candidate_Sequence_g20.t1... at 99.75%
5   108aa, >GCA_001696625.1_C1HIR_9889_genomic.fna_Candidate_Sequence_25103-30317_60... at 92.59%
6   401aa, >GCA_007994515.1_UK0001_genomic.fna_Candidate_Sequence_g13.t1... at 100.00%
7   108aa, >GCA_007994515.1_UK0001_genomic.fna_Candidate_Sequence_2796127-2801341_60... at 93.52%
8   401aa, >GWHAASU00000000_FocTR4_58.genomic.fna_Candidate_Sequence_g6.t1... at 100.00%
9   108aa, >GWHAASU00000000_FocTR4_58.genomic.fna_Candidate_Sequence_5739325-5744539_60... at 93.52%
>Cluster 2
0   373aa, >GCA_005930515.1_160527_genomic.fna_Candidate_Sequence_g6.t1... *
>Cluster 3
0   371aa, >GCA_000350365.1_Foc4_1.0_B2_genomic.fna_Candidate_Sequence_g2.t1... *
1   79aa, >GCA_000350365.1_Foc4_1.0_B2_genomic.fna_Candidate_Sequence_26225-31435_21... at 98.73%
2   30aa, >GCA_001696625.1_C1HIR_9889_genomic.fna_Candidate_Sequence_760-3765_21... at 100.00%
3   371aa, >GCA_007994515.1_UK0001_genomic.fna_Candidate_Sequence_g18.t1... at 99.73%
4   79aa, >GCA_007994515.1_UK0001_genomic.fna_Candidate_Sequence_2897725-2902935_56... at 97.47%
5   371aa, >GWHAASU00000000_FocTR4_58.genomic.fna_Candidate_Sequence_g13.t1... at 99.73%
6   79aa, >GWHAASU00000000_FocTR4_58.genomic.fna_Candidate_Sequence_5872866-5878075_56... at 97.47%
clustering data visualisation CD-HIT • 1.4k views
ADD COMMENT
0
Entering edit mode

Visualize in what way? You could build multiple sequence alignments for each of these clusters.

ADD REPLY
0
Entering edit mode

I was thinking something along the lines of a heatmap that is binary, if that makes sense? The protein clusters are from 9 genomes, 5 from one strain, 4 from another. I was hoping to try and create something that demonstrates a proteins presence in one strain and absence in another. I wasn't sure if there was something out there that would already be capable of doing that. If not I could potentially pick some clusters for MSA and present that. I suppose theres always a simple table indicating presence or absence. Just thought I'd see if anyone had any ideas

ADD REPLY

Login before adding your answer.

Traffic: 2214 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6