Question: error related to coefficients while running DESeq2 in R
0
gravatar for kmmistr1
6 months ago by
kmmistr110
kmmistr110 wrote:

Hi, I am kind of new in R so I apologize if I dont give required information to my question. I have two conditions (age of the animal model) in my dataset: "new-born = NB" and "twenty years old = twenty" for three different tissues samples "liver", "kidney" and "brain". My coldata looks like:

          condition    tissue
SRR306394      "NB"   "Liver"
SRR306397      "NB"  "Kidney"
SRR306400      "NB"   "Brain"
SRR306396  "twenty"   "Liver"
SRR306399  "twenty"  "Kidney"
SRR306402  "twenty"   "Brain"

The code I used to run deseq2 is:

pasCts <- "C:/Users/krishna/Desktop/project/NBvs20/NBvs20.Rmatrix.txt" 
pasAnno <- "C:/Users/krishna/Desktop/project/NBvs20/featurecounts_NBvs20.csv" 
cts <- as.matrix(read.csv(pasCts,sep="\t",row.names="Geneid")) 
coldata <- read.csv(pasAnno, row.names=1) 
coldata
str(coldata)

OUTPUT:

> str(coldata)
'data.frame':   6 obs. of  2 variables:
 $ condition: Factor w/ 4 levels " NB"," twenty",..: 1 2 3 4 3 4
 $ tissue   : Factor w/ 3 levels " Liver","Brain",..: 1 1 3 3 2 2

coldata <- coldata[,c("condition","tissue")]
coldata$tissue <- factor(coldata$tissue)
coldata$condition <- factor(coldata$condition)
coldata

all(rownames(coldata) %in% colnames(cts))

all(rownames(coldata) == colnames(cts))

cts <- cts[, rownames(coldata)]
all(rownames(coldata) == colnames(cts))
head(cts)

OUTPUT:

> head(cts)
                   SRR306394 SRR306396 SRR306397 SRR306399 SRR306400 SRR306402
ENSMUSG00000102693         0         0         0         0         0         0
ENSMUSG00000064842         0         0         0         0         0         0
ENSMUSG00000051951         1         0         1         0        62        22
ENSMUSG00000102851         0         0         0         0         0         0
ENSMUSG00000103377         0         0         0         0         0         0
ENSMUSG00000104017         0         0         0         0         0         0
> str(cts)
 int [1:55401, 1:6] 0 0 1 0 0 0 0 0 0 0 ...
 - attr(*, "dimnames")=List of 2
  ..$ : chr [1:55401] "ENSMUSG00000102693" "ENSMUSG00000064842" "ENSMUSG00000051951" "ENSMUSG00000102851" ...
  ..$ : chr [1:6] "SRR306394" "SRR306396" "SRR306397" "SRR306399" ...

### Create Deseq2 matrix object for data
dds <- DESeqDataSetFromMatrix(countData = cts,
                              colData = coldata,
                              design = ~condition)
dds

OUTPUT:

> dds
class: DESeqDataSet 
dim: 55401 6 
metadata(1): version
assays(1): counts
rownames(55401): ENSMUSG00000102693 ENSMUSG00000064842 ... ENSMUSG00000064371 ENSMUSG00000064372
rowData names(0):
colnames(6): SRR306394 SRR306396 ... SRR306400 SRR306402
colData names(2): condition tissue

### Prefiltering - here we are removing rows with very low read counts.
keep <- rowSums(counts(dds)) >= 10
dds <- dds[keep,]


### Set factors - in this case, "liver", "kidney and "brain"
dds$tissue <- factor(dds$tissue, levels = c("Liver", "Kidney", "Brain"))
dds$condition <- factor(dds$condition, levels = c("NB", "twenty"))

### running the differential expression analysis
dds <- DESeq(dds)
res <- results(dds)
res

resultsNames(dds)

After running resultsNames(dds), I got below output:

> resultsNames(dds)
[1] "Intercept"                    "condition_..twenty._vs_..NB." "condition_.NB._vs_..NB."      "condition_.twenty._vs_..NB."

What correcion shall I do if i want only one coefficient "twenty vs NB"?

We can order our results table by the smallest p value:

resOrdered <- res[order(res$pvalue),]
resOrdered
summary(res)

OUTPUT:

>summary(res)
out of 14195 with nonzero total read count
adjusted p-value < 0.1
LFC > 0 (up)       : 113, 0.8%
LFC < 0 (down)     : 34, 0.24%
outliers [1]       : 0, 0%
low counts [2]     : 9082, 64%
(mean count < 40)
[1] see 'cooksCutoff' argument of ?results
[2] see 'independentFiltering' argument of ?results

to know the number of adjusted p-values less than 0.1

sum(res$padj < 0.1, na.rm=TRUE)
OUTPUT:
> sum(res$padj < 0.1, na.rm=TRUE)
[1] 147

Above is the full code along with the outputs and their str() of the ojects . Thnak you for your time.

rna-seq deseq2 R • 296 views
ADD COMMENTlink modified 6 months ago • written 6 months ago by kmmistr110
1

For any test to have statistical power, it needs to operate on a sample representative of a population. Such a random sample needs to be of a certain size for the test to be able to ascertain that it actually does represent the population at large.

Here, you have exactly one sample per tissue/condition. That's not conducive to statistical comparison. For example, let's say you have 4 bags full of plastic balls. You are told that each bag has a large percentage of balls of one color and a smaller percentage of balls of other colors.

If you were allowed to pick only 1 ball from each bag, how sure can you be that the color you got is the majority color in that bag? For all you know, your observation could just be random chance, not something meaningful. That's what DESeq2 is telling you - it cannot do stats on a matrix where the number of samples == number of coefficients.

ADD REPLYlink written 6 months ago by _r_am32k

Thanks i will try to add more replicates in my test and will perform analysis again.

ADD REPLYlink written 6 months ago by kmmistr110
1
gravatar for Kevin Blighe
6 months ago by
Kevin Blighe69k
Republic of Ireland
Kevin Blighe69k wrote:

Hey,

You should not be using ~ 0 + condition as your design formula - try ~ condition. See my answer here regarding this: A: Deseq2 with one factor and multiple levels

Also, you have condition in your design formula but you don't actually set the order of its factors (?) Instead, you set the order of the tissue factors, and have "Kidney" there, even though you have not indicated that there are kidney samples in your data.

Also, if this is a 2 vs 2 comparison, you will struggle to publish this data. Okay if it is just for hypothesis generating.

Kevin

ADD COMMENTlink written 6 months ago by Kevin Blighe69k

Thanks Kevin, as per your suggesstions have corrcted my post. My bad, I made few mistakes in my script which I posted here. Initially I was working with two tissue samples and now to increase the replicates I am working with three. So, I have corrected the script and I have successfully ran the DESeq2. But The problem here is it's giving 3 coefficients while I only want to compare "NB vs twenty". I need to correct this as these three coefficients messes up the visualisation part,

ADD REPLYlink written 6 months ago by kmmistr110
1

Which coefficients does it return? Otherwise, you can try something like this: A: DESeq2 compare all levels

ADD REPLYlink written 6 months ago by Kevin Blighe69k

It returns:

> resultsNames(dds)
[1] "Intercept"                    "condition_..twenty._vs_..NB." "condition_.NB._vs_..NB."      "condition_.twenty._vs_..NB."
ADD REPLYlink modified 6 months ago by _r_am32k • written 6 months ago by kmmistr110
1

Looks like there are leading and / or trailing spaces in your metadata, which is probably the original problem. Take a look at your input metadata file - you will find entries that have 1 and 2 leading spaces, like this:

  • "<space>NB<space>"
  • "<space><space>NB<space>"

...or there may be some other illegal symbol, like exclamation mark, a number, a question mark, etc.

Any programming language will regard these as different from each other.

ADD REPLYlink modified 6 months ago • written 6 months ago by Kevin Blighe69k

Sorry but I cant find any trailing spaces or anything else in my metadata except (") which I added to all my condtions and tissues as below:

            condition   tissue
SRR306394      "NB"  "Liver"
SRR306397      "NB" "Kidney"
SRR306400      "NB"  "Brain"
SRR306396  "twenty"  "Liver"
SRR306399  "twenty" "Kidney"
SRR306402  "twenty"  "Brain"
ADD REPLYlink modified 6 months ago • written 6 months ago by kmmistr110

I see, but you probably want to remove that quotation symbol ".

ADD REPLYlink written 6 months ago by Kevin Blighe69k

I tried removing the quotation symbol, still I got same output

ADD REPLYlink modified 6 months ago • written 6 months ago by kmmistr110

Sure, can you show all code and also the output of str() applied to all of your objects at every step. Thanks.

ADD REPLYlink modified 6 months ago • written 6 months ago by Kevin Blighe69k

Hey, I have edited my code in my post and I have also added the outputs and str(). Let me know if you want more information. Thanks.

ADD REPLYlink written 6 months ago by kmmistr110

I see - thanks! There is still this issue with the spaces:

str(coldata)
'data.frame':   6 obs. of  2 variables:
 $ condition: Factor w/ 4 levels " NB"," twenty",..: 1 2 3 4 3 4
 $ tissue   : Factor w/ 3 levels " Liver","Brain",..: 1 1 3 3 2 2

Edit: if you are absolutely sure that these leading spaces do not exist in the actual values, then you may have to 're-factorise' these variables via, for example:

coldata$condition <- factor(as.character(coldata$condition))
coldata$tissue <- factor(as.character(coldata$tissue))
ADD REPLYlink modified 6 months ago • written 6 months ago by Kevin Blighe69k
1

Thanks Kevin, I got the result I wanted using your codes:

coldata$condition <- factor(as.character(coldata$condition))
coldata$tissue <- factor(as.character(coldata$tissue))

And also I've realised the existence of spaces in my coldata. I tried my code after removing those spaces and it worked. Thanks a lot for helping me.

ADD REPLYlink written 6 months ago by kmmistr110
Please log in to add an answer.

Help
Access

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.
Powered by Biostar version 2.3.0
Traffic: 1634 users visited in the last hour
_