Question: DESeq2 paired analysis
2
gravatar for dazhudou1122
2.2 years ago by
dazhudou112240
dazhudou112240 wrote:

Hey guys,

I am new to DESeq2 and I am trying to following a template on , and I am trying to analyze the bacterial genes that are differentially expressed in tumor vs normal tissue (each tumor and normal tissue are paired). There are 12 patients, therefore 12 tumors and 12 matching normal tissues in total. When I did an unpaired analysis, I found 60 genes are differentially expressed but when I did paired analysis, no gene is differentially expressed. Can anyone tell me whether I am doing this right? The following is the script:

Import data from featureCounts

Previously ran at command line something like this:

featureCounts -a genes.gtf -o counts.txt -T 12 -t exon -g gene_id GSM*.sam

countdata <- read.table("counts.txt", header=TRUE, row.names=1)

Remove first five columns (chr, start, end, strand, length)

countdata <- countdata[ ,6:ncol(countdata)]

Remove .bam or .sam from filenames

colnames(countdata) <- gsub("\.[sb]am$", "", colnames(countdata))

Convert to matrix

countdata <- as.matrix(countdata) head(countdata)

Assign condition (first four are controls, second four contain the expansion)

(condition <- factor(c(rep("ctl", 12), rep("exp", 12))))

Analysis with DESeq2 ----------------------------------------------------

library(DESeq2)

Create a coldata frame and instantiate the DESeqDataSet. See ?DESeqDataSetFromMatrix

(coldata <- data.frame(row.names=colnames(countdata), condition)) dds <- DESeqDataSetFromMatrix(countData=countdata, colData=coldata, design=~condition) dds class: DESeqDataSet dim: 1997 24 metadata(0): assays(1): counts rownames(1997): FN1496 FN1497 ... FN1493 FN1494 rowRanges metadata column names(0): colnames(24): FusoSRR317086 FusoSRR317087 ... FusoSRR317108 FusoSRR317109 colData names(1): condition

add patient information

dds$patient=c("p12","p7","p1","p4","p8","p6","p5","p11","p2","p9","p10","p3","p12","p7","p8","p4","p6","p10","p5","p2","p11","p1","p3","p9") (coldata <- data.frame(row.names=colnames(countdata), condition, dds$patient)) dds <- DESeqDataSetFromMatrix(countData=countdata, colData=coldata, design=~ condition + dds.patient) dds class: DESeqDataSet dim: 1997 24 metadata(0): assays(1): counts rownames(1997): FN1496 FN1497 ... FN1493 FN1494 rowRanges metadata column names(0): colnames(24): FusoSRR317086 FusoSRR317087 ... FusoSRR317108 FusoSRR317109 colData names(2): condition dds.patient

Thank you guys so much!

rna-seq R • 1.7k views
ADD COMMENTlink written 2.2 years ago by dazhudou112240
1

Could you post your design matrix, please; and explain why you use 'dds.patient' rather than 'patient' when setting it up

ADD REPLYlink modified 2.2 years ago • written 2.2 years ago by russhh4.1k

Do you mean this? DESeqDataSet dim: 1997 24 metadata(0): assays(1): counts rownames(1997): FN1496 FN1497 ... FN1493 FN1494 rowRanges metadata column names(0): colnames(24): FusoSRR317086 FusoSRR317087 ... FusoSRR317108 FusoSRR317109 colData names(2): condition dds.patient

ADD REPLYlink written 2.2 years ago by dazhudou112240

No, I mean, could you please post it's structure, so that I can tell from first principles whether there is some error in the way you are setting up the paired analysis. That is, what does the following look like:

model.matrix(~ condition + dds.patient, data = coldata)
ADD REPLYlink modified 2.2 years ago • written 2.2 years ago by russhh4.1k

(Intercept) conditionexp dds.patientp10 dds.patientp11 dds.patientp12 dds.patientp2 dds.patientp3 dds.patientp4 dds.patientp5 dds.patientp6 dds.patientp7 dds.patientp8 dds.patientp9 FusoSRR317086 1 0 0 0 1 0 0 0 0 0 0 0 0 FusoSRR317087 1 0 0 0 0 0 0 0 0 0 1 0 0 FusoSRR317088 1 0 0 0 0 0 0 0 0 0 0 0 0 FusoSRR317089 1 0 0 0 0 0 0 1 0 0 0 0 0 FusoSRR317090 1 0 0 0 0 0 0 0 0 0 0 1 0 FusoSRR317091 1 0 0 0 0 0 0 0 0 1 0 0 0 FusoSRR317092 1 0 0 0 0 0 0 0 1 0 0 0 0 FusoSRR317093 1 0 0 1 0 0 0 0 0 0 0 0 0 FusoSRR317094 1 0 0 0 0 1 0 0 0 0 0 0 0 FusoSRR317095 1 0 0 0 0 0 0 0 0 0 0 0 1 FusoSRR317096 1 0 1 0 0 0 0 0 0 0 0 0 0 FusoSRR317097 1 0 0 0 0 0 1 0 0 0 0 0 0 FusoSRR317098 1 1 0 0 1 0 0 0 0 0 0 0 0 FusoSRR317099 1 1 0 0 0 0 0 0 0 0 1 0 0 FusoSRR317100 1 1 0 0 0 0 0 0 0 0 0 1 0 FusoSRR317101 1 1 0 0 0 0 0 1 0 0 0 0 0 FusoSRR317102 1 1 0 0 0 0 0 0 0 1 0 0 0 FusoSRR317103 1 1 1 0 0 0 0 0 0 0 0 0 0 FusoSRR317104 1 1 0 0 0 0 0 0 1 0 0 0 0 FusoSRR317105 1 1 0 0 0 1 0 0 0 0 0 0 0 FusoSRR317106 1 1 0 1 0 0 0 0 0 0 0 0 0 FusoSRR317107 1 1 0 0 0 0 0 0 0 0 0 0 0 FusoSRR317108 1 1 0 0 0 0 1 0 0 0 0 0 0 FusoSRR317109 1 1 0 0 0 0 0 0 0 0 0 0 1 attr(,"assign") [1] 0 1 2 2 2 2 2 2 2 2 2 2 2 attr(,"contrasts") attr(,"contrasts")$condition [1] "contr.treatment" attr(,"contrasts")$dds.patient [1] "contr.treatment"

ADD REPLYlink written 2.2 years ago by dazhudou112240

enter image description here

ADD REPLYlink modified 2.2 years ago • written 2.2 years ago by dazhudou112240

Thank you so much russhh! This is what pops up after I put in your command: enter image description here

Sorry for posting an image, as the output exceeded the word limits. Thanks again!

ADD REPLYlink written 2.2 years ago by dazhudou112240

Thanks, the matrix looks fine. When you run results(dds), does it state "log2 fold change (MAP): condition exp vs ctl"? If not you might need to define contrasts or rewrite the design as '~ dds.patient + condition' (for consistency with the DESeq2 vignette). Otherwise, it might just be that you have no significant genes at your chosen cutoff (perhaps check some of the 60 genes that were putative diffexes when ran without the patient matching visually).

ADD REPLYlink written 2.2 years ago by russhh4.1k

Thanks russhh! That`s a good suggestion! I will try that:)

ADD REPLYlink written 2.2 years ago by dazhudou112240

Thanks again for the suggestion, russhh! It is now working (rewrite the design as '~ dds.patient + condition')! Thank you so much! So is it statistically correct to put dds.patient in front of condition?

ADD REPLYlink written 2.2 years ago by dazhudou112240
1

I wouldn't say it's the 'statistically correct' thing to do. It's pure implementation: DESeq2 must use the (coef for the) final column of the design matrix as it's default contrast coefficient (ie, the thing that is compared against zero in your hypothesis test). The information contained in the two design matrices is exactly the same. There'll be details on how to specify more complicated designs and alternative contrasts within the vignette for DESeq2

ADD REPLYlink modified 2.2 years ago • written 2.2 years ago by russhh4.1k
Please log in to add an answer.

Help
Access

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.
Powered by Biostar version 2.3.0
Traffic: 1389 users visited in the last hour