I am new to R and DESeq2 and have been experiencing problems with inputting raw data and creating the metadata to construct a valid dds. Here is my code:
# DESeq2 Analysis # Load in libraries library(tidyverse) library(DESeq2) library(RColorBrewer) library(SummarizedExperiment) # Load in data file in the .csv format setwd("~/Documents/Bioengineering Research/DESeq2 Analysis/Test 1") all_ountdata <- read.csv("Test 1 Raw Count Data.csv", header = TRUE) countdata <- as.matrix(all_countdata[,-1], header = TRUE, row.names = 1) head(countdata) metadata <- read.csv("Test 1 Metadata DESeq2.csv", header = TRUE) head(metadata) # Reorder data idx = match(colnames(countdata), rownames(metadata)) reordered_metadata = metadata[idx,] # Analysis with DESeq2 ------------------------------------------------------- # Initiate DESeq2 Object dds <- DESeqDataSetFromMatrix(countData = countdata, colDat = reordered_metadata, design = ~Sample)
The file format of my raw count data was in Excel that I exported as a CSV. Because I was experiencing problems creating the data frame for the metadata on my own, I manually created a metadata file that I also exported as a CSV for input.
The original raw count data includes 2000 rows with row names of the respective genes and 2 column, one for each sample. One sample has the raw counts of cells expressing high FOXP3 levels and the other is for low FOXP3 levels. There is no wt control group. (https://www.ncbi.nlm.nih.gov/geo/query/acc.cgi?acc=GSE140102).
The metadata file I created has two columns, one labeled samples (so has two rows: Sample 1 and Sample 2). The other column is FOXP3 Expression (so has two rows: low and high).
When I try running the above code, I receive the error: "Error in DESeqDataSet(se, design = design, ignoreRank) : variables in design formula cannot contain NA: Sample"
I have been unable to find information regarding this error on the Bioconductor support page or elsewhere thus far, and any help regarding this issue would be much appreciated. Thank you!