Question

Plot heatmap of differentially expressed genes identified by DESeq2 a R coding problem

0

Entering edit mode

7.3 years ago

Wet&DryImmunology ▴ 240

I was trying to a differential gene expression analysis by using Deseq2 [1] with samples like this [2], and would like to plot a heatmap using differentially expressed genes (DEGs) against all the samples, this is my initial try [3], that did not work out because I tried to subset the "resCellA" object, get the order from the subsetted object and use that to pull out rows of "rld" based on numerical indexing, but the index are not linked. I did that because I am not so sophisticated with R coding, and tried to solve the problem by aping the example in the vignette.

Actually I got kind suggestion that in stead of using index I need to use gene ID which is consistent across the res and mat. Could anyone advise me how to do that? I have two DataFrame [4], I guess what I need to do is to get the geneIDs from res with padj < 0.05, and retrieve those geneIDs in assay(rld),but how to do that?

Thanks very much in advance. (I understand that in the long run, I need to do more homework about R coding...)

[1]

sampleTable <- read.csv("samples.csv",row.names=1)
filenames <- paste0(sampleTable$samplename,".bam")
library("Rsamtools")
bamfiles <- BamFileList(filenames, yieldSize=2000000)
gtffile <- ("mm10genes.gtf")
library("GenomicFeatures")
(txdb <- makeTxDbFromGFF(gtffile, format="gtf", circ_seqs=character()))
(ebg <- exonsBy(txdb, by="gene"))
library("GenomicAlignments")
library("BiocParallel")
register(MulticoreParam())
se <- summarizeOverlaps(features=ebg, reads=bamfiles,
                        mode="Union",
                        singleEnd=TRUE,
                        ignore.strand=TRUE,
                        fragments=FALSE )
library("DESeq2")
(colData(se) <- DataFrame(sampleTable))
dds <- DESeqDataSet(se, design = ~ compartment + genotype)
dds <- dds[rowSums(counts(dds)) > 1,]
dds <- DESeq(dds)
rld <- rlog(dds, blind=FALSE)
library("gene filter")

[2]

"samples.csv"

"ids","samplename","genotype","compartment"

"457CellA","457CellA","WT","CellA"

"457CellB","457CellB","WT","CellB"

"457CellC","457CellC","WT","CellC"

"457CellD","457CellD","WT","CellD"

"458CellA","458CellA","cKO","CellA"

"458CellB","458CellB","cKO","CellB"

"458CellC","458CellC","cKO","CellC"

"458CellD","458CellD","cKO","CellD"

"459CellA","459CellA","WT","CellA"

"459CellB","459CellB","WT","CellB"

"459CellC","459CellC","WT","CellC"

"459CellD","459CellD","WT","CellD"

"460CellA","460CellA","cKO","CellA"

"460CellB","460CellB","cKO","CellB"

"460CellC","460CellC","cKO","CellC"

"460CellD","460CellD","cKO","CellD"

[3]

res <- results(dds)
sigGenes <- subset(res,padj <0.05)
sigGenes <- head(order(sigGenes$padj),decreasing=TRUE,40)
mat <- assay(rld)[ sigGenes, ]
mat <- mat - rowMeans(mat)
df <- as.data.frame(colData(rld)[,c("compartment","genotype")])
pheatmap(mat, annotation_col=df,fontsize=9,fontsize_number = 0.4 * fontsize)
dev.off()

[4]

> head(assay(rld))
                       457CellA      458CellB        459CellC          460CellD
0610005C13Rik          1.808277      1.6076085        1.533645         1.695644
0610007P14Rik          8.191532      7.5024585        8.660520         8.334527
0610009B22Rik          5.807757      5.6569778        5.989643         6.123990
0610009L18Rik          3.881685      3.5666925        3.786769         3.921051
0610009O20Rik          6.800692      6.8433131        6.951461         7.218018
0610010B08Rik         -1.224636     -0.8949234       -1.230797        -1.226311


> res <- results(dds)
> res
log2 fold change (MAP): compartment CellA vs CellB
Wald test p-value: compartment CellA vs CellB 
DataFrame with 23372 rows and 6 columns
                baseMean log2FoldChange      lfcSE        stat       padj
               <numeric>      <numeric>  <numeric>   <numeric>    <numeric>
0610005C13Rik   3.842899    -0.81806436  0.6916742 -1.18273074   0.23691588
0610007P14Rik 297.131215     0.22129241  0.1115736  1.98337585   0.04732546
0610009B22Rik  60.686295     0.43130300  0.2452162  1.75886844   0.07859986
0610009L18Rik  14.512324     0.02226444  0.3606164  0.06173996   0.95076992
0610009O20Rik 125.067373     0.19888416  0.1723663  1.15384599   0.24856332

RNA-Seq R • 8.3k views

ADD COMMENT • link 7.3 years ago by Wet&DryImmunology ▴ 240

score 5 · Answer 1 · 2017-01-26

5

Entering edit mode

7.3 years ago

GouthamAtla 12k

You can easily pull out differentially expressed gene names

de <- rownames(res[res$padj<0.05 & !is.na(res$padj), ])
de_mat <- assay(rld)[de,]

now de_mat contains the rld values of differentially expressed genes across all samples. This could be used to plot heatmap.

ADD COMMENT • link 7.3 years ago by GouthamAtla 12k

0

Entering edit mode

Hi Goutham, Thank you very much. I guess I had worked out a more tedious version but yours are more succinct. So I learnt a lot from your answer.

res0.05 <- subset(res, res$padj < 0.05)
sigGenes <- rownames(res0.05)
rows <- match(sigGenes, row.names(rld))
mat <- assay(rld)[rows,]

Just have another question, why "& !is.na(res$padj)" is needed? I can not imaging a case in which padj value is missing. TKS in advance.

ADD REPLY • link 7.3 years ago by Wet&DryImmunology ▴ 240

0

Entering edit mode

I was looking for this thank your for the solution but i have one doubt why are you using this can you explain & !is.na(res$padj),?