Question

deeptools kmeans heatmap

0

Entering edit mode

20 months ago

rekharichardson • 0

Hi all,

I am very new at bioinformatics. I have collected bulkATACseq data from 3 different timepoints (2 embryonic timepoints and 1 adult) and a positive and negative for each timepoint.

I am trying to do kmeans clustering on the combined dataset to demonstrate dynamic chromatin accessibility throughout these different timepoints and conditions.

Here is the workflow I have put my data through so far:

checked quality of ATAC reads using fastqc
trim adapters with cutadapt
map to genome with bowtie
sort sam files by coord and convert to bam
index bam files
make bigwigs using bamcoverage (I have tried both CPM and RPGC normalization. I have also included "extend reads", "ignore duplicates", and "center reads", bin size 10, smooth length 30)
call peaks using Genrich
Diffbind analysis of each possible pairwise comparison
combined the results of each diffbind analysis into one large bed file and filtered peaks by fold change greater than 1 or less than -1, and FDR < 0.05
sorted bed file by chromosome
used bedtools "merge" to get rid of duplicates
ran compute matrix (deep tools) with the following inputs: bigwig file from every condition, 1 bed file including high confidence peaks from everywhere Diffbind comparison, --referencePoint center, -b 1500, -a 1500
used plotHeatmap with the matrix generated with computeMatrix and have tried a bunch of different numbers for kmeans clusters.

Is this a workflow that makes sense? I am not getting clusters that are specific to my adult samples, which is surprising. I am wondering if I didn't normalize my data properly. Also, on my heatmap, depending on the number I input for kmeans, I am getting clusters that are skewed way to the left or right when they should be centered. What does this indicate? I am thinking it has something to do with the bed file I am inputting -- maybe I need to cut down the number of peaks? I have tried everything I can think of to change and am not sure what to try next.

Thank you in advance!

diffbind deeptools kmeans ATAC clustering • 619 views

ADD COMMENT • link updated 20 months ago by Friederike 8.9k • written 20 months ago by rekharichardson • 0

0

Entering edit mode

Can you provide some results? It's a bit difficult to judge what exactly the issue might be without actually seeing the data. There could be various reasons, e.g. lack of signal, overabundance of noise, ....

What do the DiffBind peaks for the adult vs embryos look like in the heatmap (without clustering, just plotting those peaks across all the samples)? Did you try turning off the clustering and just keeping the individual DiffBind results as chunks of regions, i.e. supply each set of (unique) DiffBind peaks as a separate BED file as shown in the example here.

ADD REPLY • link 20 months ago by Friederike 8.9k