Question

deseq2 read counts vs featurecount read counts

0

Entering edit mode

4.4 years ago

Adeler001 • 0

I'm comparing the gene expression of 3 patients and 3 controls. I used FeatureCount to generate the raw read count data for my rna seq data. I then used it as an input file in deseq2 to get my differential expression results between the patients and controls . Im wondering why the read count of deSeq2 and featurecount are drastically different for one of the genes Im looking at . On FeatureCount I get a read count of around 2000 but for Deseq2 I get a read count of around 4000

deseq2 FeatureCount RNA-Seq • 3.8k views

ADD COMMENT • link 4.4 years ago by Adeler001 • 0

1

Entering edit mode

could you please post your featurecounts and DESeq2 commands ?

ADD REPLY • link 4.4 years ago by Nicolas Rosewick 11k

0

Entering edit mode

hello Nicolas Rosewick here is the command I used for featurecounts:

/hpf/tools/centos6/subread/1.5.3/bin/featureCounts -a /hpf/projects/canvas/sequencing_repository/raw_fastq/RNA_taylor/bam/gencode.v19.annotation.gtf -o family_3_01_RNA-seq.counts -g gene_name -p -s 2 -C --donotsort  D4175_R312_BAligned.sortedByCoord.out.bam D1754_R311_BAligned.sortedByCoord.out.bam D4176_R310Aligned.sortedByCoord.out.bam D1777_R307Aligned.sortedByCoord.out.bam D4178_R308BAligned.sortedByCoord.out.bam D1828_R309_BAligned.sortedByCoord.out.bam

here is the command I used for DEseq2

# Import count table
countdata <- read.table("family_13_01_revised_RNA-seq.counts_fixed.txt", header=TRUE, row.names=1)

# Remove .bam or .sam from filenames
colnames(countdata) <- gsub("\\.[sb]am$", "", colnames(countdata))

# Convert to matrix
countdata <- as.matrix(countdata)
head(countdata)

# Assign condition (affected versus unaffected)
condition <- factor(c("affected","affected","affected","unaffected","unaffected","unaffected"),levels=c("affected","unaffected"))
condition <- relevel(condition, ref = "unaffected")

#load DESeq2#
library(DESeq2)

# Create a coldata frame and instantiate the DESeqDataSet
coldata <- data.frame(row.names=colnames(countdata),condition)

dds <- DESeqDataSetFromMatrix(countData=countdata, colData=coldata, design = ~ condition)
dds



#pre-filtering to keep only rows that have at least 1 reads total
keep <- rowSums(counts(dds)) > 1
dds <- dds[keep,]

# Run the DESeq
dds <- DESeq(dds)


# Regularized log transformation for clustering/heatmaps
rld <- rlogTransformation(dds)
head(assay(rld))
hist(assay(rld))

# Colors for plots below
library(RColorBrewer)
(mycols <- brewer.pal(8, "Dark2")[1:length(unique(condition))])

# Sample distance heatmap
sampleDists <- as.matrix(dist(t(assay(rld))))
library(gplots)
#png("qc-heatmap_baker.png", w=1000, h=1000, pointsize=20)
heatmap.2(as.matrix(sampleDists), key=F, trace="none",
          col=colorpanel(100, "black", "white"),
          ColSideColors=mycols[condition], RowSideColors=mycols[condition],
          margin=c(10, 10), main="Sample Distance Matrix")
#dev.off()

#packages that  need to be loaded  to make pca plot
library(MASS)
library(genefilter)
library(calibrate)

# Principal components analysis
rld_pca <- function (rld, intgroup = "condition", ntop = 500, colors=NULL, legendpos="bottomleft", main="PCA Biplot", textcx=1, ...) {
  require(genefilter)
  require(calibrate)
  require(RColorBrewer)
  rv = rowVars(assay(rld))
  select = order(rv, decreasing = TRUE)[seq_len(min(ntop, length(rv)))]
  pca = prcomp(t(assay(rld)[select, ]))
  fac = factor(apply(as.data.frame(colData(rld)[, intgroup, drop = FALSE]), 1, paste, collapse = " : "))
  if (is.null(colors)) {
    if (nlevels(fac) >= 3) {
      colors = brewer.pal(nlevels(fac), "Paired")
    }   else {
      colors = c("black", "red")
    }
  }
  pc1var <- round(summary(pca)$importance[2,1]*100, digits=1)
  pc2var <- round(summary(pca)$importance[2,2]*100, digits=1)
  pc1lab <- paste0("PC1 (",as.character(pc1var),"%)")
  pc2lab <- paste0("PC2 (",as.character(pc2var),"%)")
  plot(PC2~PC1, data=as.data.frame(pca$x), bg=colors[fac], pch=21, xlab=pc1lab, ylab=pc2lab, main=main, ...)
  with(as.data.frame(pca$x), textxy(PC1, PC2, labs=rownames(as.data.frame(pca$x)), cex=textcx))
  legend(legendpos, legend=levels(fac), col=colors, pch=20)
  #     rldyplot(PC2 ~ PC1, groups = fac, data = as.data.frame(pca$rld),
  #            pch = 16, cerld = 2, aspect = "iso", col = colours, main = draw.key(key = list(rect = list(col = colours),
  #                                                                                         terldt = list(levels(fac)), rep = FALSE)))
}
#png("qc-pca.png", 1000, 1000, pointsize=20)
rld_pca(rld, colors=mycols, intgroup="condition", xlim=c(-30, 30))
#dev.off()


# Get differential expression results
res <- results(dds)
table(res$padj<0.05)

## Order by adjusted p-value
res <- res[order(res$padj), ]
## Merge with normalized count data
resdata <- merge(as.data.frame(res), as.data.frame(counts(dds, normalized=TRUE)), by="row.names", sort=FALSE)
names(resdata)[1] <- "Gene"
head(resdata)
#get significant results (FDR<0.05)
## Write results
write.csv(resdata, file="sig_diffexpr-results.csv")

ADD REPLY • link updated 4.4 years ago by Nicolas Rosewick 11k • written 4.4 years ago by Adeler001 • 0

score 1 · Answer 1 · 2019-11-19

1

Entering edit mode

4.4 years ago

sangram_keshari ▴ 260

First of all, DESeq2 does not do read counting. The main purpose of it to do differential analysis using the RAW read counts generated from tools like FeatureCount.

Although DESeq2 internally transform the RAW read count value to do the analysis. I guess that's what you getting confused with read count values.

ADD COMMENT • link 4.4 years ago by sangram_keshari ▴ 260

0

Entering edit mode

yes sangram_keshari that what im confusing about . how does deseq2 transform featureCount's read count data ?