Hi, I am kind of new in R so I apologize if I dont give required information to my question. I have two conditions (age of the animal model) in my dataset: "new-born = NB" and "twenty years old = twenty" for three different tissues samples "liver", "kidney" and "brain". My coldata looks like:
condition tissue
SRR306394 "NB" "Liver"
SRR306397 "NB" "Kidney"
SRR306400 "NB" "Brain"
SRR306396 "twenty" "Liver"
SRR306399 "twenty" "Kidney"
SRR306402 "twenty" "Brain"
The code I used to run deseq2 is:
pasCts <- "C:/Users/krishna/Desktop/project/NBvs20/NBvs20.Rmatrix.txt"
pasAnno <- "C:/Users/krishna/Desktop/project/NBvs20/featurecounts_NBvs20.csv"
cts <- as.matrix(read.csv(pasCts,sep="\t",row.names="Geneid"))
coldata <- read.csv(pasAnno, row.names=1)
coldata
str(coldata)
OUTPUT:
> str(coldata)
'data.frame': 6 obs. of 2 variables:
$ condition: Factor w/ 4 levels " NB"," twenty",..: 1 2 3 4 3 4
$ tissue : Factor w/ 3 levels " Liver","Brain",..: 1 1 3 3 2 2
coldata <- coldata[,c("condition","tissue")]
coldata$tissue <- factor(coldata$tissue)
coldata$condition <- factor(coldata$condition)
coldata
all(rownames(coldata) %in% colnames(cts))
all(rownames(coldata) == colnames(cts))
cts <- cts[, rownames(coldata)]
all(rownames(coldata) == colnames(cts))
head(cts)
OUTPUT:
> head(cts)
SRR306394 SRR306396 SRR306397 SRR306399 SRR306400 SRR306402
ENSMUSG00000102693 0 0 0 0 0 0
ENSMUSG00000064842 0 0 0 0 0 0
ENSMUSG00000051951 1 0 1 0 62 22
ENSMUSG00000102851 0 0 0 0 0 0
ENSMUSG00000103377 0 0 0 0 0 0
ENSMUSG00000104017 0 0 0 0 0 0
> str(cts)
int [1:55401, 1:6] 0 0 1 0 0 0 0 0 0 0 ...
- attr(*, "dimnames")=List of 2
..$ : chr [1:55401] "ENSMUSG00000102693" "ENSMUSG00000064842" "ENSMUSG00000051951" "ENSMUSG00000102851" ...
..$ : chr [1:6] "SRR306394" "SRR306396" "SRR306397" "SRR306399" ...
### Create Deseq2 matrix object for data
dds <- DESeqDataSetFromMatrix(countData = cts,
colData = coldata,
design = ~condition)
dds
OUTPUT:
> dds
class: DESeqDataSet
dim: 55401 6
metadata(1): version
assays(1): counts
rownames(55401): ENSMUSG00000102693 ENSMUSG00000064842 ... ENSMUSG00000064371 ENSMUSG00000064372
rowData names(0):
colnames(6): SRR306394 SRR306396 ... SRR306400 SRR306402
colData names(2): condition tissue
### Prefiltering - here we are removing rows with very low read counts.
keep <- rowSums(counts(dds)) >= 10
dds <- dds[keep,]
### Set factors - in this case, "liver", "kidney and "brain"
dds$tissue <- factor(dds$tissue, levels = c("Liver", "Kidney", "Brain"))
dds$condition <- factor(dds$condition, levels = c("NB", "twenty"))
### running the differential expression analysis
dds <- DESeq(dds)
res <- results(dds)
res
resultsNames(dds)
After running resultsNames(dds), I got below output:
> resultsNames(dds)
[1] "Intercept" "condition_..twenty._vs_..NB." "condition_.NB._vs_..NB." "condition_.twenty._vs_..NB."
What correcion shall I do if i want only one coefficient "twenty vs NB"?
We can order our results table by the smallest p value:
resOrdered <- res[order(res$pvalue),]
resOrdered
summary(res)
OUTPUT:
>summary(res)
out of 14195 with nonzero total read count
adjusted p-value < 0.1
LFC > 0 (up) : 113, 0.8%
LFC < 0 (down) : 34, 0.24%
outliers [1] : 0, 0%
low counts [2] : 9082, 64%
(mean count < 40)
[1] see 'cooksCutoff' argument of ?results
[2] see 'independentFiltering' argument of ?results
to know the number of adjusted p-values less than 0.1
sum(res$padj < 0.1, na.rm=TRUE)
OUTPUT:
> sum(res$padj < 0.1, na.rm=TRUE)
[1] 147
Above is the full code along with the outputs and their str() of the ojects . Thnak you for your time.
For any test to have statistical power, it needs to operate on a sample representative of a population. Such a random sample needs to be of a certain size for the test to be able to ascertain that it actually does represent the population at large.
Here, you have exactly one sample per tissue/condition. That's not conducive to statistical comparison. For example, let's say you have 4 bags full of plastic balls. You are told that each bag has a large percentage of balls of one color and a smaller percentage of balls of other colors.
If you were allowed to pick only 1 ball from each bag, how sure can you be that the color you got is the majority color in that bag? For all you know, your observation could just be random chance, not something meaningful. That's what DESeq2 is telling you - it cannot do stats on a matrix where the number of samples == number of coefficients.
Thanks i will try to add more replicates in my test and will perform analysis again.