Question: Error in DESeqDataSet(se, design = design, ignoreRank) : variables in design formula cannot contain NA:
0
gravatar for draven.a.rane
4 weeks ago by
draven.a.rane0 wrote:

Hello,

I am new to R and DESeq2 and have been experiencing problems with inputting raw data and creating the metadata to construct a valid dds. Here is my code:

    # DESeq2 Analysis

# Load in libraries
library(tidyverse)
library(DESeq2)
library(RColorBrewer)
library(SummarizedExperiment)


# Load in data file in the .csv format 

setwd("~/Documents/Bioengineering Research/DESeq2 Analysis/Test 1")
all_ountdata <- read.csv("Test 1 Raw Count Data.csv", header = TRUE)
countdata <- as.matrix(all_countdata[,-1], header = TRUE, row.names = 1)
head(countdata)

metadata <- read.csv("Test 1 Metadata DESeq2.csv", header = TRUE)
head(metadata)

# Reorder data

idx = match(colnames(countdata), rownames(metadata))
reordered_metadata = metadata[idx,]


# Analysis with DESeq2 -------------------------------------------------------

# Initiate DESeq2 Object

dds <- DESeqDataSetFromMatrix(countData = countdata, colDat = reordered_metadata, 
                              design = ~Sample)

The file format of my raw count data was in Excel that I exported as a CSV. Because I was experiencing problems creating the data frame for the metadata on my own, I manually created a metadata file that I also exported as a CSV for input.

The original raw count data includes 2000 rows with row names of the respective genes and 2 column, one for each sample. One sample has the raw counts of cells expressing high FOXP3 levels and the other is for low FOXP3 levels. There is no wt control group. (https://www.ncbi.nlm.nih.gov/geo/query/acc.cgi?acc=GSE140102).

The metadata file I created has two columns, one labeled samples (so has two rows: Sample 1 and Sample 2). The other column is FOXP3 Expression (so has two rows: low and high).

When I try running the above code, I receive the error: "Error in DESeqDataSet(se, design = design, ignoreRank) : variables in design formula cannot contain NA: Sample"

I have been unable to find information regarding this error on the Bioconductor support page or elsewhere thus far, and any help regarding this issue would be much appreciated. Thank you!

rna-seq • 176 views
ADD COMMENTlink modified 4 weeks ago by swbarnes28.2k • written 4 weeks ago by draven.a.rane0
0
gravatar for swbarnes2
4 weeks ago by
swbarnes28.2k
United States
swbarnes28.2k wrote:

You have two samples? Why bother? You can't use DESeq to find DE genes between only two samples.

ADD COMMENTlink written 4 weeks ago by swbarnes28.2k

Yes, I understand the data isn't good enough for normal DESeq2 analysis. My task is only to recapitulate the findings of the study that published this data. It's supposed to be kind of a test run for my first time doing DESeq2 that my post-doc gave me. So, any help with this would help me understand the general work flow of DESeq2 and be much appreciated.

ADD REPLYlink written 4 weeks ago by draven.a.rane0

The error message should help you to diagnose the problem (in fact, it diagnoses the problem for you):

Error in DESeqDataSet(se, design = design, ignoreRank) : variables in design formula cannot contain NA: Sample

ADD REPLYlink modified 4 weeks ago • written 4 weeks ago by Kevin Blighe63k

I can see that. However, I do not understand what I am doing wrong. If you understand the error I am making it would be nice if you could share it rather than stating what R has already told me and what I have already tried to fix on my own.

ADD REPLYlink written 4 weeks ago by draven.a.rane0

It means that there are NA values in reordered_metadata$Sample, but there cannot be. You will have to trace back a few steps in order to understand why.

What swbarnes2 is saying is important, too, i.e., you should really follow a tutorial first. In the past, when I was learning, I typically followed a tutorial, studied the input and output of each command, and commented my own code. Then, it became easier to adapt these tutorials to other / new datasets.

For DESeq2, I even have a very simple introduction indirectly via one of my own packages: https://bioconductor.org/packages/release/bioc/vignettes/EnhancedVolcano/inst/doc/EnhancedVolcano.html#quick-start

ADD REPLYlink written 4 weeks ago by Kevin Blighe63k

I'd learn on a tutorial dataset, not this. I'm not sure this will run with only two samples.

ADD REPLYlink written 4 weeks ago by swbarnes28.2k
Please log in to add an answer.

Help
Access

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.
Powered by Biostar version 2.3.0
Traffic: 727 users visited in the last hour