Question: Plot heatmap of differentially expressed genes identified by DESeq2 a R coding problem
0
gravatar for Wet&DryImmunology
2.1 years ago by
Japan
Wet&DryImmunology200 wrote:

I was trying to a differential gene expression analysis by using Deseq2 [1] with samples like this [2], and would like to plot a heatmap using differentially expressed genes (DEGs) against all the samples, this is my initial try [3], that did not work out because I tried to subset the "resCellA" object, get the order from the subsetted object and use that to pull out rows of "rld" based on numerical indexing, but the index are not linked. I did that because I am not so sophisticated with R coding, and tried to solve the problem by aping the example in the vignette.

Actually I got kind suggestion that in stead of using index I need to use gene ID which is consistent across the res and mat. Could anyone advise me how to do that? I have two DataFrame [4], I guess what I need to do is to get the geneIDs from res with padj < 0.05, and retrieve those geneIDs in assay(rld),but how to do that?

Thanks very much in advance. (I understand that in the long run, I need to do more homework about R coding...)

[1]

sampleTable <- read.csv("samples.csv",row.names=1)
filenames <- paste0(sampleTable$samplename,".bam")
library("Rsamtools")
bamfiles <- BamFileList(filenames, yieldSize=2000000)
gtffile <- ("mm10genes.gtf")
library("GenomicFeatures")
(txdb <- makeTxDbFromGFF(gtffile, format="gtf", circ_seqs=character()))
(ebg <- exonsBy(txdb, by="gene"))
library("GenomicAlignments")
library("BiocParallel")
register(MulticoreParam())
se <- summarizeOverlaps(features=ebg, reads=bamfiles,
                        mode="Union",
                        singleEnd=TRUE,
                        ignore.strand=TRUE,
                        fragments=FALSE )
library("DESeq2")
(colData(se) <- DataFrame(sampleTable))
dds <- DESeqDataSet(se, design = ~ compartment + genotype)
dds <- dds[rowSums(counts(dds)) > 1,]
dds <- DESeq(dds)
rld <- rlog(dds, blind=FALSE)
library("gene filter")

[2]

"samples.csv"

"ids","samplename","genotype","compartment"

"457CellA","457CellA","WT","CellA"

"457CellB","457CellB","WT","CellB"

"457CellC","457CellC","WT","CellC"

"457CellD","457CellD","WT","CellD"

"458CellA","458CellA","cKO","CellA"

"458CellB","458CellB","cKO","CellB"

"458CellC","458CellC","cKO","CellC"

"458CellD","458CellD","cKO","CellD"

"459CellA","459CellA","WT","CellA"

"459CellB","459CellB","WT","CellB"

"459CellC","459CellC","WT","CellC"

"459CellD","459CellD","WT","CellD"

"460CellA","460CellA","cKO","CellA"

"460CellB","460CellB","cKO","CellB"

"460CellC","460CellC","cKO","CellC"

"460CellD","460CellD","cKO","CellD"

[3]

res <- results(dds)
sigGenes <- subset(res,padj <0.05)
sigGenes <- head(order(sigGenes$padj),decreasing=TRUE,40)
mat <- assay(rld)[ sigGenes, ]
mat <- mat - rowMeans(mat)
df <- as.data.frame(colData(rld)[,c("compartment","genotype")])
pheatmap(mat, annotation_col=df,fontsize=9,fontsize_number = 0.4 * fontsize)
dev.off()

[4]

> head(assay(rld))
                       457CellA      458CellB        459CellC          460CellD
0610005C13Rik          1.808277      1.6076085        1.533645         1.695644
0610007P14Rik          8.191532      7.5024585        8.660520         8.334527
0610009B22Rik          5.807757      5.6569778        5.989643         6.123990
0610009L18Rik          3.881685      3.5666925        3.786769         3.921051
0610009O20Rik          6.800692      6.8433131        6.951461         7.218018
0610010B08Rik         -1.224636     -0.8949234       -1.230797        -1.226311


> res <- results(dds)
> res
log2 fold change (MAP): compartment CellA vs CellB
Wald test p-value: compartment CellA vs CellB 
DataFrame with 23372 rows and 6 columns
                baseMean log2FoldChange      lfcSE        stat       padj
               <numeric>      <numeric>  <numeric>   <numeric>    <numeric>
0610005C13Rik   3.842899    -0.81806436  0.6916742 -1.18273074   0.23691588
0610007P14Rik 297.131215     0.22129241  0.1115736  1.98337585   0.04732546
0610009B22Rik  60.686295     0.43130300  0.2452162  1.75886844   0.07859986
0610009L18Rik  14.512324     0.02226444  0.3606164  0.06173996   0.95076992
0610009O20Rik 125.067373     0.19888416  0.1723663  1.15384599   0.24856332
rna-seq R • 3.3k views
ADD COMMENTlink modified 2.1 years ago • written 2.1 years ago by Wet&DryImmunology200
3
gravatar for geek_y
2.1 years ago by
geek_y9.1k
Barcelona/CRG/London/Imperial
geek_y9.1k wrote:

You can easily pull out differentially expressed gene names

de <- rownames(res[res$padj<0.05 & !is.na(res$padj), ])
de_mat <- assay(rld)[de,]

now de_mat contains the rld values of differentially expressed genes across all samples. This could be used to plot heatmap.

ADD COMMENTlink modified 2.1 years ago • written 2.1 years ago by geek_y9.1k

Hi Goutham, Thank you very much. I guess I had worked out a more tedious version but yours are more succinct. So I learnt a lot from your answer.

res0.05 <- subset(res, res$padj < 0.05)
sigGenes <- rownames(res0.05)
rows <- match(sigGenes, row.names(rld))
mat <- assay(rld)[rows,]

Just have another question, why "& !is.na(res$padj)" is needed? I can not imaging a case in which padj value is missing. TKS in advance.

ADD REPLYlink written 2.1 years ago by Wet&DryImmunology200

I was looking for this thank your for the solution but i have one doubt why are you using this can you explain & !is.na(res$padj),?

ADD REPLYlink written 21 months ago by krushnach80460
Please log in to add an answer.

Help
Access

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.
Powered by Biostar version 2.3.0
Traffic: 689 users visited in the last hour