Question: Advice needed on RNAseq technical and biological replicates analysis using DEseq2
0
gravatar for macmath
2.2 years ago by
macmath140
France
macmath140 wrote:

Kindly correct me if I am wrong, I checked my data alignment all were >80% and I could see alignment on both strands. Then I used samtools with the below command

samtools view -h -b -q 20 -F 4 AlignedKO1.bam >KO1hq.bam

HTSEQ : Count

htseq-count -r pos -t exon -f bam -a 0 alignment_STAR/KO1hq.bam reference/mm10ens84/mm10ens84.gtf >KO1hq.counts

Followed by DESeq2 I am sharing the design and would like advice and correction if needed for performing the analysis

sampleFiles<- c("KO1a.counts", "KO1b.counts", "KO1c.counts", "KO1d.counts", "KO2a.counts", "KO2b.counts", "KO2c.counts", "KO2d.counts", "KO3a.counts", "KO3b.counts", "KO3c.counts", "KO3d.counts", "WT1a.counts", "WT1b.counts", "WT1c.counts", "WT1d.counts", "WT2a.counts", "WT2b.counts", "WT2c.counts", "WT2d.counts", "WT3a.counts", "WT3b.counts", "WT3c.counts", "WT3d.counts")
sampleNames <- c("KO1a", "KO1b", "KO1c", "KO1d", "KO2a", "KO2b", "KO2c", "KO2d", "KO3a", "KO3b", "KO3c", "KO3d", "WT1a", "WT1b", "WT1c", "WT1d", "WT2a", "WT2b", "WT2c", "WT2d", "WT3a", "WT3b", "WT3c", "WT3d")
sampleCondition <- c("KO1", "KO1", "KO1", "KO1", "KO2", "KO2", "KO2", "KO2", "KO3", "KO3", "KO3", "KO3", "WT1", "WT1", "WT1", "WT1", "WT2", "WT2", "WT2", "WT2", "WT3", "WT3", "WT3", "WT3")
sampleTable <- data.frame(sampleName = sampleNames, fileName = sampleFiles, condition = sampleCondition)
treatments <- c("KO1", "KO2", "KO3", "WT1", "WT2", "WT3")
library("DESeq2")
ddsHTSeq <- DESeqDataSetFromHTSeqCount(sampleTable = sampleTable, directory = directory, design = ~ condition)
colData(ddsHTSeq)$condition <- factor(colData(ddsHTSeq)$condition, levels = treatments)

I am not sure how to merge the technical replicates eg. "KO1a", "KO1b", "KO1c", "KO1d" into KO1 and how should the design look like. Additionally these are at different time points. Looking forward for suggestions.

rna-seq deseq2 next-gen • 917 views
ADD COMMENTlink modified 2.2 years ago by Samuel Brady300 • written 2.2 years ago by macmath140
1
gravatar for Samuel Brady
2.2 years ago by
Samuel Brady300
Samuel Brady300 wrote:

Hi macmath. I would suggest performing a differential expression analysis within each timepoint (WT1 replicates vs. KO1 replicates, and see which genes you get; WT2 replicates vs. KO2 replicates, and see which genes you get, etc.). You will probably get some genes similar and some genes different at each timepoint. Then you can make a heatmap of all of those genes and see them evolving, like this: https://www.researchgate.net/figure/261443713_fig2_Dynamics-of-meiotic-gene-expression-The-heatmap-shows-probe-sets-having-meiotic.

It would also be helpful to ask how closely your technical replicates are to one another using Euclidean distance in DE-Seq.

You would hope that your technical replicates (a vs. b vs. c vs. d) cluster very close to one another, your time series cluster somewhat closely within the experimental group (KO1 vs. KO2 vs. KO3), and your experimental perturbations (WT vs. KO) cluster more apart from each other.

Below is some example code that will perform Euclidean distance analysis in DE-Seq. You would need a matrix of count data and a metadata file. This is modified from example code I found somewhere on the Internet, though I can't seem to find the link now.

rnaMatrix <- as.matrix(read.table("YourData.counts", header=TRUE, stringsAsFactors=FALSE, row.names=1, check.names=FALSE)) # read in RNA feature counts data
metaData <- as.data.frame(read.table("YourMetadata.txt", header=TRUE, stringsAsFactors=FALSE, row.names=1))
# rlog the rnaMatrix values
dds <- DESeqDataSetFromMatrix(countData = rnaMatrix, colData = metaData, design = ~ 1)

rld <- rlog(dds)

sampleDists <- dist(t(assay(rld)))
sampleDistMatrix <- as.matrix(sampleDists)

write.table(as.data.frame(sampleDistMatrix), "YourSampleDists.txt", quote=FALSE, sep="\t", col.names=NA)

rownames(sampleDistMatrix) <- paste( rld$dex, rld$cell, sep="-" )
colors <- colorRampPalette( rev(brewer.pal(9, "Blues")) )(255)
hc <- hclust(sampleDists)
heatmap.2( sampleDistMatrix, Rowv=as.dendrogram(hc),
           symm=TRUE, trace="none", col=colors,
           margins=c(2,10), labCol=FALSE )

It may also be useful to perform t-SNE analysis to see where your samples sit on a 2-dimensional plot; you may see your sample "moving" differently through the time course. t-SNE shows you how different each sample is in a 2-dimensional graph. You can do it using Seurat in R. Some example images are here: http://satijalab.org/seurat/get_started.html. Good luck.

ADD COMMENTlink written 2.2 years ago by Samuel Brady300

Thank you very much for your suggestions. Please could you check the second part of the code if it's right before I start the analysis. Thank you sincerely for your suggestion.

ADD REPLYlink written 2.2 years ago by macmath140
Please log in to add an answer.

Help
Access

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.
Powered by Biostar version 2.3.0
Traffic: 1953 users visited in the last hour