Entering edit mode
5.2 years ago
mxlsherry1992
▴
80
Dear all,
I have a species without reference genome, so I did the RNA-SEQ analysis using " Trinity-align_and_estimate_abundance.pl -RSEM
" pipeline, and got two files:
RSEM.isoforms.results
RSEM.genes.results
Then, I used abundance_estimates_to_matrix.pl
and run_DE_analysis.pl
to do the differential expression analysis.
But now, I want to use R studio to do some PCA plot and heatmap, here is part of my R script, I just want to know which file should I used as the input file for R? (in the position of "pan_all_counts")
source("https://bioconductor.org/biocLite.R")
biocLite("pheatmap")
library("DESeq2")
library(ggplot2)
library(dplyr)
library(pheatmap)
## Input the count data
countData <- read.table("pan_all_counts", header=TRUE, row.names=1)
dim(countData)
head(countData)
##Import the metadata that inlcudes the treatment assignments and if single or paired reads.
colData <- read.table("pan_all_count_sample", header=TRUE, row.names=1)
dim(colData)
head(colData)
# check the order of treatment assignments in colData and countData
all(rownames(colData) %in% colnames(countData))
countData <- countData[, rownames(colData)]
all(rownames(colData) == colnames(countData))
##e Creat the dataset and define model
?DESeqDataSetFromMatrix
dds <- DESeqDataSetFromMatrix(countData = countData,
colData = colData,
design = ~ condition)
dds
##########remove genes with no counts
dds <- dds[rowSums(counts(dds)) > 1,]
dds
###########PCA plot
#rlog This function transforms the count data to the log2 scale in a way which minimizes differences between samples for rows with small counts,
#and which normalizes with respect to library size.
#rld<-rlog(dds)
#plotPCA(rld)
############
#perform DESeq, DESeq will do the following things:
#estimating dispersions
#found already estimated dispersions, replacing these
#gene-wise dispersion estimates
#mean-dispersion relationship
#final dispersion estimates
#fitting model and testing
jumpy <- DESeq(dds)
jumpy
resultsNames(jumpy)
#make sure results look okay
res <- results(jumpy)
write.csv(as.data.frame(res),file="res1.csv")
head(res)
summary(res)
#reduce alpha to .05 for analysis of results
?results
res05 <- results(jumpy, alpha =.05)
head(res05)
names(res05)
#find out how many genes are differentially expressed below the maximum adjusted p-value cutoff
#sum(res05$padj < .05, na.rm=TRUE)
sum(res05$padj < .05, na.rm=TRUE)
#order all the genes by padj
#resOrdered <- res05[order(res05$padj),]
resOrdered <- res05[order(res05$padj),]
#store the number of DE genes as a variable
#numDEgenes <- sum(res05$padj < .05, na.rm=TRUE) should use padj
numDEgenes <- sum(res05$padj < .05 & (res05$log2FoldChange > 1 | res05$log2FoldChange < -1), na.rm=TRUE)
numDEgenes
#get only the DEgenes ordered by padj
DEgenes <- head(resOrdered, numDEgenes)
head(DEgenes)
#write only these genes to a CSV file
write.csv(as.data.frame(DEgenes), file="starfishtopDEgenes1.csv")
If above is your aim, I wonder why you want to do DEseq as you already did differential analysis using RSEM. You can use your .results files to generate heatmap and pca plot. I assume it has count matrix where rows are genes / isoforms and columns are samples.
Hi, Thanks for the reply! so you means I can use the results of
directly for PCA and heatmap?
Here is part of the RSEM.genes.results, if I need to extracted one columns to do that? ( And I have 7 sample, 2 replicated, so I got 14 RSEM.genes.results files together, will want to do the PCA and heatmap together, if you know which software can do that ..)
It seems
RSEM.genes.results
contains normalised expression values which are TPM and FPKM. As you said, you have already performed differential analysis usingrun_DE_analysis.pl
you must have fold change matrix as well. Depending upon your purpose of analysis you can use either FPKM/ TPM or fold change value to plot the heatmap. One of the purposes of PCA analysis is to check how good replicates are. If you also want to use PCA for same purpose, I would suggest you to use log2(FPKM) values to generate PCA plot. There are several tutorial available online for PCA. For I heamap my personal favourite us ComplexHeatmapThanks for your kindly reply!! So it seems FPKM/FPM and log2foldchange value works for heatmap plotting..? I just copy and paste my results for different samples and input them to do heatmap is ok...?:)
Thank you!
This is very vague question to answer. It depends on for what purpose you want to plot the heatmap