DESeq2
0
0
Entering edit mode
17 months ago
Nelo ▴ 20

Hey guys !

I have a bioproject with 23 samples(paired-end), in which it consists of experiment at different timepoint as post-secretion,pre-secretion and secretion of nector in both male and female. And each timepoint or samples have mostly 3 biological replicates except for female-pre-secretion which got only 2 replicates. Right now I already got the count matrix data using featureCounts. My problem is I got stuck at DESeq2 analysis.

This is the chunk of code I tried:

counts <- read.csv("count_matrix_PRJNA437911.txt", header=TRUE, row.names=1, sep="\t")
sampleTable <- data.frame(condition=c("female","female","female","female","female","female","female","female","female","female","female","male","male","male","male","male","male","male","male","male","male","male","male"), 
                            timepoint=c("post","post","post","pre#1","pre#1","pre#1","pre#2","pre#2","secretory","secretory","secretory","post","post","post","pre#1","pre#1","pre#1","pre#2","pre#2","pre#2","secretory","secretory","secretory"),
                            replicate=c("1","2","3","1","2","3","1","3","1","2","3","1","2","3","1","2","3","1","2","3","1","2","3"),
                            fileName=c("Female_post-secretion_Rep1.sorted.bam","Female_post-secretion_Rep2.sorted.bam","Female_post-secretion_Rep3.sorted.bam","Female_pre-secretory#1_Rep1.sorted.bam","Female_pre-secretory#1_Rep2.sorted.bam","Female_pre-secretory#1_Rep3.sorted.bam","Female_pre-secretory#2_Rep1.sorted.bam","Female_pre-secretory#2_Rep3.sorted.bam","Female_secretory_Rep1.sorted.bam","Female_secretory_Rep2.sorted.bam","Female_secretory_Rep3.sorted.bam","Male_post-secretion_Rep1.sorted.bam","Male_post-secretion_Rep2.sorted.bam","Male_post-secretion_Rep3.sorted.bam","Male_pre-secretory#1_Rep1.sorted.bam","Male_pre-secretory#1_Rep2.sorted.bam","Male_pre-secretory#1_Rep3.sorted.bam","Male_pre-secretory#2_Rep1.sorted.bam","Male_pre-secretory#2_Rep2.sorted.bam","Male_pre-secretory#2_Rep3.sorted.bam","Male_secretory_Rep1.sorted.bam","Male_secretory_Rep2.sorted.bam","Male_secretory_Rep3.sorted.bam")) 

dds <- DESeqDataSetFromMatrix(countData = counts, colData = sampleTable, design = ~ condition + timepoint+replicate)
dds <- DESeqDataSetFromMatrix(countData = counts, colData = sampleTable, design = ~ condition + timepoint+replicate+condition:timepoint:replicate) 

(which one dds is better?)

dds <- DESeq(dds)
results <- results(dds)
summary(results)
#output of summary:
out of 26174 with nonzero total read count
adjusted p-value < 0.1
LFC > 0 (up)       : 45, 0.17%
LFC < 0 (down)     : 21, 0.08%
outliers [1]       : 0, 0%
low counts [2]     : 0, 0%
(mean count < 0)
[1] see 'cooksCutoff' argument of ?results
[2] see 'independentFiltering' argument of ?results

I have the following scenario:

Since I wanted to get the DEGs, my first questions is till now am I doing it corect or not(if not please do suggest)

  1. I am not able to plot heatmap out of this result above.
  2. I also wanted to run DESeq2 for the same dataset but for specific genes of about 50 only(although total no of genes=27868). At what point in this downstream analysis should I specify it.

Pardon me for asking in weird way as I am totally new to R and RNA-seq analysis.

heatmap R DESeq2 • 1.1k views
ADD COMMENT
0
Entering edit mode

If you describe in plain english: Which question do you want to answer, so which groups do you wish to compare. In any case, remove replicate from the formula, that's wrong. DESeq2 automatically knows what a replicate is for samples that have the same information in the design.

ADD REPLY
0
Entering edit mode

Are you asking me to remove replicate information from sampleTable? If so, I will get error while dds code( or do you mean that I do not have to specify anywhere in my analysis that my dataset has replicates?)

ADD REPLY
0
Entering edit mode

the replicates are determined automatically from the strings. Identical string values will be treated as replicates

(a more proper way to state this is that the strings will be treated as factors that have numerical values behind the scenes)

ADD REPLY
0
Entering edit mode

but as you can see from that chunk of code I run, the files names end up with different suffix(i.e., rep1, rep2...), which means different string values.

ADD REPLY
1
Entering edit mode

File names have no meaning, it is the colData that matters and specifically only those columns that are used for the design. See vignette of DESeq2 for these basics.

ADD REPLY
0
Entering edit mode

Hi So for the dataset I mentioned, I tried to plot heatmap to see expression of top 50 genes in a given dataset .(limited to 50 because it is more easy to visualized).

In the heat map shown here, rows are Z-Score scaled, which is giving me only the information about the varying expression of single genes across the samples right ? If yes, then this map is giving me the differences between the samples ? Also, looking at the plot, is it making sense or not.

Since its hard to understand the interpretation part of the results, please advise me.

Another doubt is, I don't know how to put this questions but, looking at the samples names, I was expecting the females and males not mixed-up like that(may be because I didn't use 'contrast' in my analysis?)

enter image description here

ADD REPLY

Login before adding your answer.

Traffic: 1821 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6