Question

Advice needed on RNAseq technical and biological replicates analysis using DEseq2

0

Entering edit mode

6.8 years ago

macmath ▴ 170

Kindly correct me if I am wrong, I checked my data alignment all were >80% and I could see alignment on both strands. Then I used samtools with the below command

samtools view -h -b -q 20 -F 4 AlignedKO1.bam >KO1hq.bam

HTSEQ : Count

htseq-count -r pos -t exon -f bam -a 0 alignment_STAR/KO1hq.bam reference/mm10ens84/mm10ens84.gtf >KO1hq.counts

Followed by DESeq2 I am sharing the design and would like advice and correction if needed for performing the analysis

sampleFiles<- c("KO1a.counts", "KO1b.counts", "KO1c.counts", "KO1d.counts", "KO2a.counts", "KO2b.counts", "KO2c.counts", "KO2d.counts", "KO3a.counts", "KO3b.counts", "KO3c.counts", "KO3d.counts", "WT1a.counts", "WT1b.counts", "WT1c.counts", "WT1d.counts", "WT2a.counts", "WT2b.counts", "WT2c.counts", "WT2d.counts", "WT3a.counts", "WT3b.counts", "WT3c.counts", "WT3d.counts")
sampleNames <- c("KO1a", "KO1b", "KO1c", "KO1d", "KO2a", "KO2b", "KO2c", "KO2d", "KO3a", "KO3b", "KO3c", "KO3d", "WT1a", "WT1b", "WT1c", "WT1d", "WT2a", "WT2b", "WT2c", "WT2d", "WT3a", "WT3b", "WT3c", "WT3d")
sampleCondition <- c("KO1", "KO1", "KO1", "KO1", "KO2", "KO2", "KO2", "KO2", "KO3", "KO3", "KO3", "KO3", "WT1", "WT1", "WT1", "WT1", "WT2", "WT2", "WT2", "WT2", "WT3", "WT3", "WT3", "WT3")
sampleTable <- data.frame(sampleName = sampleNames, fileName = sampleFiles, condition = sampleCondition)
treatments <- c("KO1", "KO2", "KO3", "WT1", "WT2", "WT3")
library("DESeq2")
ddsHTSeq <- DESeqDataSetFromHTSeqCount(sampleTable = sampleTable, directory = directory, design = ~ condition)
colData(ddsHTSeq)$condition <- factor(colData(ddsHTSeq)$condition, levels = treatments)

I am not sure how to merge the technical replicates eg. "KO1a", "KO1b", "KO1c", "KO1d" into KO1 and how should the design look like. Additionally these are at different time points. Looking forward for suggestions.

RNA-Seq DESeq2 next-gen • 2.2k views

ADD COMMENT • link updated 6.8 years ago by Samuel Brady ▴ 330 • written 6.8 years ago by macmath ▴ 170

score 1 · Answer 1 · 2017-06-30

Hi macmath. I would suggest performing a differential expression analysis within each timepoint (WT1 replicates vs. KO1 replicates, and see which genes you get; WT2 replicates vs. KO2 replicates, and see which genes you get, etc.). You will probably get some genes similar and some genes different at each timepoint. Then you can make a heatmap of all of those genes and see them evolving, like this: https://www.researchgate.net/figure/261443713_fig2_Dynamics-of-meiotic-gene-expression-The-heatmap-shows-probe-sets-having-meiotic.

It would also be helpful to ask how closely your technical replicates are to one another using Euclidean distance in DE-Seq.

You would hope that your technical replicates (a vs. b vs. c vs. d) cluster very close to one another, your time series cluster somewhat closely within the experimental group (KO1 vs. KO2 vs. KO3), and your experimental perturbations (WT vs. KO) cluster more apart from each other.

Below is some example code that will perform Euclidean distance analysis in DE-Seq. You would need a matrix of count data and a metadata file. This is modified from example code I found somewhere on the Internet, though I can't seem to find the link now.

rnaMatrix <- as.matrix(read.table("YourData.counts", header=TRUE, stringsAsFactors=FALSE, row.names=1, check.names=FALSE)) # read in RNA feature counts data
metaData <- as.data.frame(read.table("YourMetadata.txt", header=TRUE, stringsAsFactors=FALSE, row.names=1))
# rlog the rnaMatrix values
dds <- DESeqDataSetFromMatrix(countData = rnaMatrix, colData = metaData, design = ~ 1)

rld <- rlog(dds)

sampleDists <- dist(t(assay(rld)))
sampleDistMatrix <- as.matrix(sampleDists)

write.table(as.data.frame(sampleDistMatrix), "YourSampleDists.txt", quote=FALSE, sep="\t", col.names=NA)

rownames(sampleDistMatrix) <- paste( rld$dex, rld$cell, sep="-" )
colors <- colorRampPalette( rev(brewer.pal(9, "Blues")) )(255)
hc <- hclust(sampleDists)
heatmap.2( sampleDistMatrix, Rowv=as.dendrogram(hc),
           symm=TRUE, trace="none", col=colors,
           margins=c(2,10), labCol=FALSE )

It may also be useful to perform t-SNE analysis to see where your samples sit on a 2-dimensional plot; you may see your sample "moving" differently through the time course. t-SNE shows you how different each sample is in a 2-dimensional graph. You can do it using Seurat in R. Some example images are here: http://satijalab.org/seurat/get_started.html. Good luck.