Question: Strangely looking pheatmap after DEseq2
0
gravatar for tanya_fiskur
9 months ago by
tanya_fiskur40
tanya_fiskur40 wrote:

Hello!

I am trying to draw a heatmap for the differentially expressed genes. I did it, following the tutorial: http://bioconductor.org/packages/release/bioc/vignettes/DESeq2/inst/doc/DESeq2.html

select <- order(rowMeans(counts(dds,normalized=TRUE)),
            decreasing=TRUE)[1:20]

> select
 [1]  4531  7824  5244  5309 18901 19260 19824 21819 16048 20366 20448 19275 22579 20979 10215
 [16] 22060  1652  2091 14069 17406

df <- as.data.frame(colData(dds)[,"Age"])
rownames(df) <- colnames(assay(ntd)[select,]) #additional line, recommended by other tutorials, without it heatmap outputs error

> df
   colData(dds)[, "Age"]
A1             3.5_weeks
A2             3.5_weeks
A3             3.5_weeks
B1             8.5_weeks
B2             8.5_weeks
B3             8.5_weeks
C1              14_weeks
C2              14_weeks
C3              14_weeks

pheatmap(assay(ntd)[select,], cluster_rows=FALSE, show_rownames=FALSE,
     cluster_cols=FALSE, annotation_col=df)`

The resulting pheatmap looks like this: https://ibb.co/WykjCkd

if picture is not shown, press the link above

Why are there strange lines each 5 genes, and no blue part of the plot below?

Thank you very much in advance!

rna-seq deseq2 pheatmap • 357 views
ADD COMMENTlink modified 9 months ago by Kevin Blighe69k • written 9 months ago by tanya_fiskur40
1
gravatar for Kevin Blighe
9 months ago by
Kevin Blighe69k
Republic of Ireland
Kevin Blighe69k wrote:

I am not sure what is unusual here. This code will merely create a heatmap of the top 20 genes based on normalised count. The colour scheme is probably not intuitive because the colour shade goes from blue-white-red while the values go from ~10 to ~16.

It is likely pure coincidence that the unusual signal occurs every 5 rows.

If you want, please show the output of:

counts(dds,normalized=TRUE)[select,]

Kevin

ADD COMMENTlink written 9 months ago by Kevin Blighe69k

Thank you! The output:

> counts(dds,normalized=TRUE)[select,]
                      A1        A2        A3        B1         B2        B3        C1
Nf_1_004531 49216.609 38310.001 52683.232 60835.332 61316.0741 55007.475 62430.015
Nf_1_007824 27231.960 41369.197 35939.795 58822.836 27629.2749 47563.491 36320.484
Nf_1_005244 15839.796 18705.426 13647.251 21174.409 16791.7696 20812.083 18067.050
Nf_1_005309 19570.699 21828.399 21586.428 15830.231  9687.2361 15603.902 15171.921
Nf_1_018902 18209.008 18932.305 17598.355 17048.265  8326.8446 17837.531 16240.967
Nf_1_019261  5813.051 59252.844 38181.988  3079.857  1194.6951  4517.233  3284.321
Nf_1_019825 14219.933 17404.798 14295.140  9745.328 10061.8541 10433.745 10887.524
Nf_1_021820 10555.204  9751.580  8893.912 14670.148  8350.8586 15124.801 11661.803
Nf_1_016049 11427.581 11464.144 12827.455 13113.888  6883.6049 12741.249 11527.145
Nf_1_020367  8898.992 11056.391  8312.568 14339.298  9651.2151 13113.883 11805.492
Nf_1_020449 10592.485  8897.389 10083.402 11012.884  9007.6407  9972.026 11622.391
Nf_1_019276  3618.128 46698.221 22798.100  2040.945   838.0876  2720.335  2617.604
Nf_1_022580 11936.468 10349.618 10161.962  8227.000  9115.7036  7681.904  9713.379
Nf_1_020980  7412.409  7827.820  6603.658  8503.059  8671.4451  8985.579  8667.323
Nf_1_010215  7939.937  6996.630  7011.245  8853.929  6061.1264  8524.947  8476.832
Nf_1_022061  7673.377  6754.069  6422.508  8362.922  5995.0880  8219.670  8334.785
Nf_1_001652  7379.788  6745.705  6064.829  7216.537  8796.3178  7426.601  8060.545
Nf_1_002091  2989.942 35683.651 18605.771  1227.517   726.4226  1776.257  2300.667
Nf_1_014070  6849.465  6353.635  4742.249  8820.212  5603.6602  8394.580  7902.076
Nf_1_017407  6455.218  7014.404  6366.129  7669.612  5429.5590  8082.784  7204.979
                       C2         C3
Nf_1_004531 63510.5786 53728.3908
Nf_1_007824 53981.2295 44263.6530
Nf_1_005244 13580.9372 16039.7943
Nf_1_005309 14888.9704 11032.8475
Nf_1_018902 16629.6150 12382.9879
Nf_1_019261  1323.3318  1377.6510
Nf_1_019825 10746.4404  9781.1108
Nf_1_021820 12488.7848 12114.2295
Nf_1_016049 11875.9895  9266.8724
Nf_1_020367  9356.8144  9890.0955
Nf_1_020449 12096.1198  7152.7812
Nf_1_019276  1029.2581   971.3392
Nf_1_022580  6972.7773  5665.0873
Nf_1_020980  8380.2517  7967.5210
Nf_1_010215  8540.0374  6463.9556
Nf_1_022061  8381.1016  7127.3867
Nf_1_001652  7293.1987  6927.4051
Nf_1_002091   883.9211   838.0181
Nf_1_014070  7531.1775  7267.0564
Nf_1_017407  7182.7086  6822.6528
ADD REPLYlink modified 9 months ago • written 9 months ago by tanya_fiskur40

Thanks! Yes, see, the issue is right there:

The genes represented by those bands just coincidentally have a wide range of values across your sample, so, high variance: Row 6:

Nf_1_019261  5813.051 59252.844 38181.988  3079.857  1194.6951  4517.233  3284.321

Row 12:

Nf_1_019276  3618.128 46698.221 22798.100  2040.945   838.0876  2720.335  2617.604

The samples that have large values 'boost up' the value of the mean for the gene. This is of course a prime example of where the mean can be misleading when the data distribution is non-uniform.

ADD REPLYlink written 9 months ago by Kevin Blighe69k

Thanks a lot! And you told that "the data distribution is non-uniform" - is there a way to check the uniformity or to improve it? Sorry if it is too basic question

ADD REPLYlink written 9 months ago by tanya_fiskur40

The Shapiro-Wilk test can test for the 'normality' of each gene; however, this test loses sensitivity beyond [and under] a certain sample size. You may want to consult a statistician.

For producing heatmaps like these, the data should be normalised, and is also usually transformed via log or some other transformation. Your input looks like normalised counts?

ADD REPLYlink written 9 months ago by Kevin Blighe69k
Please log in to add an answer.

Help
Access

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.
Powered by Biostar version 2.3.0
Traffic: 1615 users visited in the last hour
_