Error in DESeqDataSet(se, design = design, ignoreRank) : variables in design formula cannot contain NA:
1
0
Entering edit mode
3.8 years ago

Hello,

I am new to R and DESeq2 and have been experiencing problems with inputting raw data and creating the metadata to construct a valid dds. Here is my code:

    # DESeq2 Analysis

# Load in libraries
library(tidyverse)
library(DESeq2)
library(RColorBrewer)
library(SummarizedExperiment)


# Load in data file in the .csv format 

setwd("~/Documents/Bioengineering Research/DESeq2 Analysis/Test 1")
all_ountdata <- read.csv("Test 1 Raw Count Data.csv", header = TRUE)
countdata <- as.matrix(all_countdata[,-1], header = TRUE, row.names = 1)
head(countdata)

metadata <- read.csv("Test 1 Metadata DESeq2.csv", header = TRUE)
head(metadata)

# Reorder data

idx = match(colnames(countdata), rownames(metadata))
reordered_metadata = metadata[idx,]


# Analysis with DESeq2 -------------------------------------------------------

# Initiate DESeq2 Object

dds <- DESeqDataSetFromMatrix(countData = countdata, colDat = reordered_metadata, 
                              design = ~Sample)

The file format of my raw count data was in Excel that I exported as a CSV. Because I was experiencing problems creating the data frame for the metadata on my own, I manually created a metadata file that I also exported as a CSV for input.

The original raw count data includes 2000 rows with row names of the respective genes and 2 column, one for each sample. One sample has the raw counts of cells expressing high FOXP3 levels and the other is for low FOXP3 levels. There is no wt control group. (https://www.ncbi.nlm.nih.gov/geo/query/acc.cgi?acc=GSE140102).

The metadata file I created has two columns, one labeled samples (so has two rows: Sample 1 and Sample 2). The other column is FOXP3 Expression (so has two rows: low and high).

When I try running the above code, I receive the error: "Error in DESeqDataSet(se, design = design, ignoreRank) : variables in design formula cannot contain NA: Sample"

I have been unable to find information regarding this error on the Bioconductor support page or elsewhere thus far, and any help regarding this issue would be much appreciated. Thank you!

RNA-Seq • 11k views
ADD COMMENT
0
Entering edit mode
3.8 years ago

You have two samples? Why bother? You can't use DESeq to find DE genes between only two samples.

ADD COMMENT
0
Entering edit mode

Yes, I understand the data isn't good enough for normal DESeq2 analysis. My task is only to recapitulate the findings of the study that published this data. It's supposed to be kind of a test run for my first time doing DESeq2 that my post-doc gave me. So, any help with this would help me understand the general work flow of DESeq2 and be much appreciated.

ADD REPLY
0
Entering edit mode

The error message should help you to diagnose the problem (in fact, it diagnoses the problem for you):

Error in DESeqDataSet(se, design = design, ignoreRank) : variables in design formula cannot contain NA: Sample

ADD REPLY
1
Entering edit mode

I can see that. However, I do not understand what I am doing wrong. If you understand the error I am making it would be nice if you could share it rather than stating what R has already told me and what I have already tried to fix on my own.

ADD REPLY
0
Entering edit mode

It means that there are NA values in reordered_metadata$Sample, but there cannot be. You will have to trace back a few steps in order to understand why.

What swbarnes2 is saying is important, too, i.e., you should really follow a tutorial first. In the past, when I was learning, I typically followed a tutorial, studied the input and output of each command, and commented my own code. Then, it became easier to adapt these tutorials to other / new datasets.

For DESeq2, I even have a very simple introduction indirectly via one of my own packages: https://bioconductor.org/packages/release/bioc/vignettes/EnhancedVolcano/inst/doc/EnhancedVolcano.html#quick-start

ADD REPLY
0
Entering edit mode

I'd learn on a tutorial dataset, not this. I'm not sure this will run with only two samples.

ADD REPLY

Login before adding your answer.

Traffic: 2543 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6