Differential gene expression based on read counts using DESeq package
1
0
Entering edit mode
4.6 years ago

Hi,

I have been provided 30 samples (15 pairs). Each pair has one cancer and one normal sample. They are paired-end sequencing data for 15 cancer and 15 normal samples. I am doing RNA-seq analysis for these samples using DESeq package. I used HTSeq_count to count the number of reads for each gene.

Group1 - Sample1 is cancer and Sample2 is normal

Group2 - Sample3 is cancer and Sample4 is normal

Group3 - Sample5 is cancer and Sample6 is normal

... ...

Group15- Sample29 is cancer and Sample30 is normal

The technician asked me to compare each group (1cancer and 1normal) individually. So, I have to perform 15 pairwise comparisons.

1) I merged the htseq_count of cancer and normal samples into single txt file.

2) I used following steps in DESeq to plot differential gene expression

my design <- data.frame(row.names=colnames(countsTable),condition=c("Can","Nor"))

conds <- factor(mydesign\$condition)

cds <- newCountDataSet(countsTable, bonds)

cds.norm <- estimateSizeFactors(cds)

sizeFactors(cds.norm)

Sample1 Sample2

1.34715 0.74230

res <- nbinomTest(cds.norm,"Can","Nor")

colnames(res)

[1] "id" "baseMean" "baseMeanA" "baseMeanB"

Questions:

1) For this scenario that is without replicates, pairwise comparison of 1 cancer and 1 normal. Do I need to consider log_fold_change or p-value or adjusted p-value?

2) I have 3 cases in the above image,

- case A looks significant based on log_fold_change and p-value

- case B is not significant based on log_fold_change, p-value and adjusted p-value

- case C looks significant based on p-value, but one of the sample's baseMean is "0".
Do I need to ignore such genes or consider them as significant genes?

RNA-Seq • 2.0k views
1
Entering edit mode

If you are using DESeq R package, I would advice you to use DESeq2 (http://www.bioconductor.org/packages/release/bioc/html/DESeq2.html).

0
Entering edit mode

Thanks, now I am using DESeq2 package.

0
Entering edit mode

There is no such thing as "looks significant". Depending on a pre-set cut off on the adjusted p-value your results are either significant or aren't. Furthermore, size of log fold change doesn't say anything about significance.

0
Entering edit mode

Thanks, as I don't have replicates. Instead of p-value, I am planning to consider log fold change value for my analysis.

2
Entering edit mode
4.6 years ago
Steven Lakin ★ 1.5k

Assuming these are all the same kind of cancer samples from the same tissue type, you should pool them and do a single two-group comparison with 15 in each group so that the answer you obtain will be based on valid statistical design. Doing 15 pairwise comparisons with 1 in each group will give you practically nonsensical statistics (note that your adjusted p-value is 1 in all cases), so if you have to do a pairwise comparison 15 times, then you should just report log-fold change as a descriptive number and disregard the p-value. DESeq is designed to normalize based on dispersion estimates from a minimum of 3 per group.

0
Entering edit mode

Thanks, Steven. Initially I suggested my technician that I will take all cancer samples( sample1,3,5,7,9,11,13,15) and all normal samples(sample2,4,6,8,10,12,14) together and perform a single two-group comparison. In that case, the p-value will be of much use.

But my technician told to compare, 15 pairwise comparisons, b'cos he wants to look into the patient specific differential gene expression and their associated mutations. As you suggested, I will disregard the p-value and consider the log2_fold_change values. The other person told me to use DESeq2 package. I have used the following commands.

Can I consider value "1" of log_fold_change for filtering genes?

Step1: Using read.table, I read the htseq_count output file for cancer and normal in a single file

countsTableMatrix = as.matrix(countsTable)

condition = c("Cancer","Normal")

coldata = data.frame(row.names=colnames(countsTableMatrix), condition)

dds = DESeqDataSetFromMatrix(countData=countsTableMatrix, colData=coldata, design=~condition)

dds = DESeq(dds) estimating size factors, estimating dispersions, gene-wise dispersion estimates, mean-dispersion relationship, final dispersion estimates, fitting model and testing,

Warning message: In checkForExperimentalReplicates(object, modelMatrix) : same number of samples and coefficients to fit, estimating dispersion by treating samples as replicates. read the ?DESeq section on 'Experiments without replicates'

res = results(dds)

resdata = merge(as.data.frame(res),as.data.frame(counts(dds,normalized=TRUE)),by="row.names",sort=FALSE)

colnames(restate)

1)Gene, 2)baseMean, 3) log2FoldChange, 4)lfcSE, 5) stat, 6)value, 7) pads

upreg_gene_1fold = subset(resdata, log2FoldChange >=1)

downreg_gene_1fold = subset(resdata, log2FoldChange <=-1)

0
Entering edit mode

Hi Steven,

Can you please comment on this post as well Question: Comparing the output from DESeq and DESeq2