Question

Error in DESeqDataSet(se, design = design, ignoreRank) : variables in design formula cannot contain NA:

0

Entering edit mode

3.8 years ago

draven.a.rane ▴ 10

Hello,

I am new to R and DESeq2 and have been experiencing problems with inputting raw data and creating the metadata to construct a valid dds. Here is my code:

    # DESeq2 Analysis

# Load in libraries
library(tidyverse)
library(DESeq2)
library(RColorBrewer)
library(SummarizedExperiment)


# Load in data file in the .csv format 

setwd("~/Documents/Bioengineering Research/DESeq2 Analysis/Test 1")
all_ountdata <- read.csv("Test 1 Raw Count Data.csv", header = TRUE)
countdata <- as.matrix(all_countdata[,-1], header = TRUE, row.names = 1)
head(countdata)

metadata <- read.csv("Test 1 Metadata DESeq2.csv", header = TRUE)
head(metadata)

# Reorder data

idx = match(colnames(countdata), rownames(metadata))
reordered_metadata = metadata[idx,]


# Analysis with DESeq2 -------------------------------------------------------

# Initiate DESeq2 Object

dds <- DESeqDataSetFromMatrix(countData = countdata, colDat = reordered_metadata, 
                              design = ~Sample)

The file format of my raw count data was in Excel that I exported as a CSV. Because I was experiencing problems creating the data frame for the metadata on my own, I manually created a metadata file that I also exported as a CSV for input.

The original raw count data includes 2000 rows with row names of the respective genes and 2 column, one for each sample. One sample has the raw counts of cells expressing high FOXP3 levels and the other is for low FOXP3 levels. There is no wt control group. (https://www.ncbi.nlm.nih.gov/geo/query/acc.cgi?acc=GSE140102).

The metadata file I created has two columns, one labeled samples (so has two rows: Sample 1 and Sample 2). The other column is FOXP3 Expression (so has two rows: low and high).

When I try running the above code, I receive the error: "Error in DESeqDataSet(se, design = design, ignoreRank) : variables in design formula cannot contain NA: Sample"

I have been unable to find information regarding this error on the Bioconductor support page or elsewhere thus far, and any help regarding this issue would be much appreciated. Thank you!

RNA-Seq • 11k views

ADD COMMENT • link updated 3.8 years ago by swbarnes2 14k • written 3.8 years ago by draven.a.rane ▴ 10

score 0 · Answer 1 · 2020-07-11

0

Entering edit mode

3.8 years ago

swbarnes2 14k

You have two samples? Why bother? You can't use DESeq to find DE genes between only two samples.

ADD COMMENT • link 3.8 years ago by swbarnes2 14k

0

Entering edit mode

Yes, I understand the data isn't good enough for normal DESeq2 analysis. My task is only to recapitulate the findings of the study that published this data. It's supposed to be kind of a test run for my first time doing DESeq2 that my post-doc gave me. So, any help with this would help me understand the general work flow of DESeq2 and be much appreciated.

ADD REPLY • link 3.8 years ago by draven.a.rane ▴ 10

0

Entering edit mode

The error message should help you to diagnose the problem (in fact, it diagnoses the problem for you):

Error in DESeqDataSet(se, design = design, ignoreRank) : variables in design formula cannot contain NA: Sample

ADD REPLY • link 3.8 years ago by Kevin Blighe 87k

1

Entering edit mode

I can see that. However, I do not understand what I am doing wrong. If you understand the error I am making it would be nice if you could share it rather than stating what R has already told me and what I have already tried to fix on my own.

ADD REPLY • link 3.8 years ago by draven.a.rane ▴ 10

0

Entering edit mode

It means that there are NA values in reordered_metadata$Sample, but there cannot be. You will have to trace back a few steps in order to understand why.

What swbarnes2 is saying is important, too, i.e., you should really follow a tutorial first. In the past, when I was learning, I typically followed a tutorial, studied the input and output of each command, and commented my own code. Then, it became easier to adapt these tutorials to other / new datasets.

For DESeq2, I even have a very simple introduction indirectly via one of my own packages: https://bioconductor.org/packages/release/bioc/vignettes/EnhancedVolcano/inst/doc/EnhancedVolcano.html#quick-start

ADD REPLY • link 3.8 years ago by Kevin Blighe 87k

0

Entering edit mode

I'd learn on a tutorial dataset, not this. I'm not sure this will run with only two samples.

ADD REPLY • link 3.8 years ago by swbarnes2 14k