Closed:Issues with DESeq2 LRT for time-course experiment
0
0
Entering edit mode
3.7 years ago

Hi all,

I am a high school student interning in a bioinformatics laboratory this summer, and I'm new to R, RNA-seq, etc. I'm analyzing data from an experiment looking at gene expression changes between two conditions over the course of three timepoints. I attempted to use DESeq2's LRT option in order to identify genes with condition-specific temporal expression changes, but only had one significant result. When another member of my lab used LRT, he received several hundred significant results. I've pasted both of our scripts (and relevant outputs) below -- can anyone see what might have caused such a drastic difference in our results?

For reference, counts.tsv is the gene expression matrix, and coldata.tsv is a matrix with samples as rows, Condition & Time as columns, and individual matrix elements are a sample's condition or collection timepoint. Coldata.tsv is identical to the matrix created using cbind() in the second script.

My script & outputs:

# Data Selection & Prep
cts <- as.matrix(read.csv(file.choose(), sep = "\t", row.names = "gene_id")) # select counts.tsv
coldata <- read.csv(file.choose(), sep = "\t", row.names = "Samples") # select coldata.tsv
Time <- factor(coldata$Time, levels = c("3","7","14"))
Condition <- factor(coldata$Condition, levels = c("A", "B"))
all(rownames(coldata) == colnames(cts))
*[1] TRUE*

# LRT
library(DESeq2)
dds <- DESeqDataSetFromMatrix(countData = cts, colData = coldata, design = ~ Condition + Time + Condition:Time)
*Warning message:*
In DESeqDataSet(se, design = design, ignoreRank) :
  some variables in design formula are characters, converting to factors
dds_lrt <- DESeq(dds, test = "LRT", reduced = ~ Condition + Time)
estimating size factors
estimating dispersions
gene-wise dispersion estimates
mean-dispersion relationship
final dispersion estimates
fitting model and testing

# Extracting Results
res_lrt <- results(dds_lrt)
res_lrt <- res_lrt[order(res_lrt$pvalue), ]
res_lrt
log2 fold change (MLE): ConditionB.Time7
LRT p-value: '~ Condition + Time + Condition:Time' vs '~ Condition + Time'
DataFrame with 55291 rows and 6 columns
         baseMean log2FoldChange     lfcSE      stat      pvalue      padj
        <numeric>      <numeric> <numeric> <numeric>   <numeric> <numeric>
Dhh       305.439      -0.318403  0.220665   28.5176 6.41922e-07 0.0234160
Id1       328.252       0.522730  0.351085   24.5935 4.56646e-06 0.0832876
Pdgfra    256.663       1.174865  0.628269   16.9216 2.11605e-04 1.0000000
Aldh1a7   102.727      -0.110197  0.517990   16.8907 2.14897e-04 1.0000000
Ahrr      655.521      -0.205311  0.287271   16.8163 2.23040e-04 1.0000000
...           ...            ...       ...       ...         ...       ...
Gm21748         0             NA        NA        NA          NA        NA
mt-Ty           0             NA        NA        NA          NA        NA
mt-Td           0             NA        NA        NA          NA        NA
mt-Tk           0             NA        NA        NA          NA        NA
mt-Th           0             NA        NA        NA          NA        NA
sig_lrt <- subset(res_l, baseMean>20 & padj<0.05)
sig_lrt
baseMean log2FoldChange     lfcSE    stat       pvalue       padj
Dhh 305.4386     -0.3184035 0.2206646 28.5176 6.419222e-07 0.02341604

Another lab member's script & outputs:

 library(DESeq2)
counts<-read.table("counts.tsv")
design<- cbind(condition=c('A','A','A','A','A','A','A','A','A','B','B','B','B','B','B','B','B','B'), samples=c("A30.R_S2","A43.L_S13","A45.R_S16","A32.R_S4","A33.L_S8","A43.RL_S14","A32.L_S5","A35.L_S11","A46.R_S17","A30.N_S1","A30.RL_S3","A33.RR_S10","A32.RL_S6","A33.RL_S9","A44.R_S15","A32.RR_S7","A35.RL_S12","A46.L_S18"), time=c('3','3','3','7','7','7','14','14','14','3','3','3','7','7','7','14','14','14')) 
 full_model <- ~condition + time + condition:time
reduced_model <- ~condition + time 
dds<-DESeqDataSetFromMatrix(countData=counts,colData=design, design= ~ condition + time + condition:time)
dds_lrt_time <- DESeq(dds, test="LRT", reduced = ~ condition + time)
res<-results(dds_lrt_time)
sig <- subset(res, baseMean>20 & padj < 0.05 )
log2 fold change (MLE): conditionB.time7 
LRT p-value: '~ condition + time + condition:time' vs '~ condition + time' 
DataFrame with 342 rows and 6 columns
         baseMean log2FoldChange     lfcSE      stat      pvalue        padj
        <numeric>      <numeric> <numeric> <numeric>   <numeric>   <numeric>
Xkr4      44.2752       0.951242  0.521092  27.89740 8.75299e-07 7.09958e-05
Gm19938   32.8723      -2.692142  0.982106  12.29223 2.14178e-03 1.82805e-02
Sntg1     83.0165       0.629027  0.339907   9.57564 8.33058e-03 4.65789e-02
Eya1      40.5863      -0.668471  0.537290   9.47682 8.75254e-03 4.79500e-02
Gm29216  223.6502      -0.650962  0.362693   9.93755 6.95166e-03 4.11904e-02
...           ...            ...       ...       ...         ...         ...
Eif2s3y   509.392      -1.962028  2.566384   18.8918 7.90119e-05 1.79711e-03
Uty       165.199      -2.873323  3.045134   14.5634 6.88029e-04 8.24535e-03
Ddx3y     500.055      -2.019335  2.229747   16.1066 3.18052e-04 4.75764e-03
Gm47283   198.531      -1.234294  0.374524   36.3764 1.26170e-08 3.34519e-06
mt-Rnr2 42367.112      -0.627189  0.283350   11.4852 3.20650e-03 2.40670e-02

Any help on this would be greatly appreciated! The other lab member and I are both new to NGS analyses, so we are uncertain of how to proceed. We hope that some more experienced bioinformaticians might be able to guide us on the right path, as we are also the first people in our lab to perform these types of analyses. Thank you!

Kind regards, Luke Briody

RNA-Seq rna-seq R DESeq2 LRT • 452 views
ADD COMMENT
This thread is not open. No new answers may be added
Traffic: 2773 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6