error related to coefficients while running DESeq2 in R
1
0
Entering edit mode
3.8 years ago
kmmistr1 ▴ 10

Hi, I am kind of new in R so I apologize if I dont give required information to my question. I have two conditions (age of the animal model) in my dataset: "new-born = NB" and "twenty years old = twenty" for three different tissues samples "liver", "kidney" and "brain". My coldata looks like:

          condition    tissue
SRR306394      "NB"   "Liver"
SRR306397      "NB"  "Kidney"
SRR306400      "NB"   "Brain"
SRR306396  "twenty"   "Liver"
SRR306399  "twenty"  "Kidney"
SRR306402  "twenty"   "Brain"

The code I used to run deseq2 is:

pasCts <- "C:/Users/krishna/Desktop/project/NBvs20/NBvs20.Rmatrix.txt" 
pasAnno <- "C:/Users/krishna/Desktop/project/NBvs20/featurecounts_NBvs20.csv" 
cts <- as.matrix(read.csv(pasCts,sep="\t",row.names="Geneid")) 
coldata <- read.csv(pasAnno, row.names=1) 
coldata
str(coldata)

OUTPUT:

> str(coldata)
'data.frame':   6 obs. of  2 variables:
 $ condition: Factor w/ 4 levels " NB"," twenty",..: 1 2 3 4 3 4
 $ tissue   : Factor w/ 3 levels " Liver","Brain",..: 1 1 3 3 2 2

coldata <- coldata[,c("condition","tissue")]
coldata$tissue <- factor(coldata$tissue)
coldata$condition <- factor(coldata$condition)
coldata

all(rownames(coldata) %in% colnames(cts))

all(rownames(coldata) == colnames(cts))

cts <- cts[, rownames(coldata)]
all(rownames(coldata) == colnames(cts))
head(cts)

OUTPUT:

> head(cts)
                   SRR306394 SRR306396 SRR306397 SRR306399 SRR306400 SRR306402
ENSMUSG00000102693         0         0         0         0         0         0
ENSMUSG00000064842         0         0         0         0         0         0
ENSMUSG00000051951         1         0         1         0        62        22
ENSMUSG00000102851         0         0         0         0         0         0
ENSMUSG00000103377         0         0         0         0         0         0
ENSMUSG00000104017         0         0         0         0         0         0
> str(cts)
 int [1:55401, 1:6] 0 0 1 0 0 0 0 0 0 0 ...
 - attr(*, "dimnames")=List of 2
  ..$ : chr [1:55401] "ENSMUSG00000102693" "ENSMUSG00000064842" "ENSMUSG00000051951" "ENSMUSG00000102851" ...
  ..$ : chr [1:6] "SRR306394" "SRR306396" "SRR306397" "SRR306399" ...

### Create Deseq2 matrix object for data
dds <- DESeqDataSetFromMatrix(countData = cts,
                              colData = coldata,
                              design = ~condition)
dds

OUTPUT:

> dds
class: DESeqDataSet 
dim: 55401 6 
metadata(1): version
assays(1): counts
rownames(55401): ENSMUSG00000102693 ENSMUSG00000064842 ... ENSMUSG00000064371 ENSMUSG00000064372
rowData names(0):
colnames(6): SRR306394 SRR306396 ... SRR306400 SRR306402
colData names(2): condition tissue

### Prefiltering - here we are removing rows with very low read counts.
keep <- rowSums(counts(dds)) >= 10
dds <- dds[keep,]


### Set factors - in this case, "liver", "kidney and "brain"
dds$tissue <- factor(dds$tissue, levels = c("Liver", "Kidney", "Brain"))
dds$condition <- factor(dds$condition, levels = c("NB", "twenty"))

### running the differential expression analysis
dds <- DESeq(dds)
res <- results(dds)
res

resultsNames(dds)

After running resultsNames(dds), I got below output:

> resultsNames(dds)
[1] "Intercept"                    "condition_..twenty._vs_..NB." "condition_.NB._vs_..NB."      "condition_.twenty._vs_..NB."

What correcion shall I do if i want only one coefficient "twenty vs NB"?

We can order our results table by the smallest p value:

resOrdered <- res[order(res$pvalue),]
resOrdered
summary(res)

OUTPUT:

>summary(res)
out of 14195 with nonzero total read count
adjusted p-value < 0.1
LFC > 0 (up)       : 113, 0.8%
LFC < 0 (down)     : 34, 0.24%
outliers [1]       : 0, 0%
low counts [2]     : 9082, 64%
(mean count < 40)
[1] see 'cooksCutoff' argument of ?results
[2] see 'independentFiltering' argument of ?results

to know the number of adjusted p-values less than 0.1

sum(res$padj < 0.1, na.rm=TRUE)
OUTPUT:
> sum(res$padj < 0.1, na.rm=TRUE)
[1] 147

Above is the full code along with the outputs and their str() of the ojects . Thnak you for your time.

R DESeq2 RNA-Seq • 1.2k views
ADD COMMENT
1
Entering edit mode

For any test to have statistical power, it needs to operate on a sample representative of a population. Such a random sample needs to be of a certain size for the test to be able to ascertain that it actually does represent the population at large.

Here, you have exactly one sample per tissue/condition. That's not conducive to statistical comparison. For example, let's say you have 4 bags full of plastic balls. You are told that each bag has a large percentage of balls of one color and a smaller percentage of balls of other colors.

If you were allowed to pick only 1 ball from each bag, how sure can you be that the color you got is the majority color in that bag? For all you know, your observation could just be random chance, not something meaningful. That's what DESeq2 is telling you - it cannot do stats on a matrix where the number of samples == number of coefficients.

ADD REPLY
0
Entering edit mode

Thanks i will try to add more replicates in my test and will perform analysis again.

ADD REPLY
1
Entering edit mode
3.8 years ago

Hey,

You should not be using ~ 0 + condition as your design formula - try ~ condition. See my answer here regarding this: A: Deseq2 with one factor and multiple levels

Also, you have condition in your design formula but you don't actually set the order of its factors (?) Instead, you set the order of the tissue factors, and have "Kidney" there, even though you have not indicated that there are kidney samples in your data.

Also, if this is a 2 vs 2 comparison, you will struggle to publish this data. Okay if it is just for hypothesis generating.

Kevin

ADD COMMENT
0
Entering edit mode

Thanks Kevin, as per your suggesstions have corrcted my post. My bad, I made few mistakes in my script which I posted here. Initially I was working with two tissue samples and now to increase the replicates I am working with three. So, I have corrected the script and I have successfully ran the DESeq2. But The problem here is it's giving 3 coefficients while I only want to compare "NB vs twenty". I need to correct this as these three coefficients messes up the visualisation part,

ADD REPLY
1
Entering edit mode

Which coefficients does it return? Otherwise, you can try something like this: A: DESeq2 compare all levels

ADD REPLY
0
Entering edit mode

It returns:

> resultsNames(dds)
[1] "Intercept"                    "condition_..twenty._vs_..NB." "condition_.NB._vs_..NB."      "condition_.twenty._vs_..NB."
ADD REPLY
1
Entering edit mode

Looks like there are leading and / or trailing spaces in your metadata, which is probably the original problem. Take a look at your input metadata file - you will find entries that have 1 and 2 leading spaces, like this:

  • "<space>NB<space>"
  • "<space><space>NB<space>"

...or there may be some other illegal symbol, like exclamation mark, a number, a question mark, etc.

Any programming language will regard these as different from each other.

ADD REPLY
0
Entering edit mode

Sorry but I cant find any trailing spaces or anything else in my metadata except (") which I added to all my condtions and tissues as below:

            condition   tissue
SRR306394      "NB"  "Liver"
SRR306397      "NB" "Kidney"
SRR306400      "NB"  "Brain"
SRR306396  "twenty"  "Liver"
SRR306399  "twenty" "Kidney"
SRR306402  "twenty"  "Brain"
ADD REPLY
0
Entering edit mode

I see, but you probably want to remove that quotation symbol ".

ADD REPLY
0
Entering edit mode

I tried removing the quotation symbol, still I got same output

ADD REPLY
0
Entering edit mode

Sure, can you show all code and also the output of str() applied to all of your objects at every step. Thanks.

ADD REPLY
0
Entering edit mode

Hey, I have edited my code in my post and I have also added the outputs and str(). Let me know if you want more information. Thanks.

ADD REPLY
0
Entering edit mode

I see - thanks! There is still this issue with the spaces:

str(coldata)
'data.frame':   6 obs. of  2 variables:
 $ condition: Factor w/ 4 levels " NB"," twenty",..: 1 2 3 4 3 4
 $ tissue   : Factor w/ 3 levels " Liver","Brain",..: 1 1 3 3 2 2

Edit: if you are absolutely sure that these leading spaces do not exist in the actual values, then you may have to 're-factorise' these variables via, for example:

coldata$condition <- factor(as.character(coldata$condition))
coldata$tissue <- factor(as.character(coldata$tissue))
ADD REPLY
1
Entering edit mode

Thanks Kevin, I got the result I wanted using your codes:

coldata$condition <- factor(as.character(coldata$condition))
coldata$tissue <- factor(as.character(coldata$tissue))

And also I've realised the existence of spaces in my coldata. I tried my code after removing those spaces and it worked. Thanks a lot for helping me.

ADD REPLY

Login before adding your answer.

Traffic: 1507 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6