Hi, I have used the rnaseq nextflow pipeline to generate count tables from 9 samples of this GEO dataset:
https://www.ncbi.nlm.nih.gov/geo/query/acc.cgi?acc=GSE134127,
There are 18 samples, 9 exposed to a chemical compound and 9 unexposed, I have chosen the 9 unexposed samples as they are of interest to me for a project; the samples appear in SRA like this:
My goal is to do a differential expression analysis between these samples and obtain a volcano plot and a heatmap with DESeq2.
I understand that DESeq2 in order to generate a volcano plot and heatmap, needs to generate a matrix counts that can be generated from a previous matrix or a tximport object containing the rounded counts, for example, from a matrix:
dds <- DESeqDataSetFromMatrix(countData = counts_as_matrix,
colData = samplesheet_metadata_SRA,
design= ~ condition) # This is where i do not know what to assign
I have managed to generate a matrix object with the counts of my samples, but I have doubts about the colData and design arguments, because I don't know if they are right.
For example this is how a part of my file that should go in colData looks like, which is basically the SRA metadata file of the 9 samples I have mentioned, I show 3 of the 9 samples, which all have the same characteristics for source_name, tissue, etc.
Experiment | Sample | source_name | Tissue | disease | Cell_type |
---|---|---|---|---|---|
SRX6430481 | GSM3937647 | mock-treated | breast cancer | adenocarcinoma | epithelial |
SRX6430482 | GSM3937648 | mock-treated | breast cancer | adenocarcinoma | epithelial |
SRX6430483 | GSM3937649 | mock-treated | breast cancer | adenocarcinoma | epithelial |
My biggest doubt is regarding the design argument in the function DESeqDataSetFromMatrix, what should I put there? as you can see in the metadata table, all the samples are mock-treated.
Thank you for your help and time
You need to set the format required : https://www.rdocumentation.org/packages/DESeq2/versions/1.12.3/topics/DESeqDataSet-class
"colData : for matrix input: a DataFrame or data.frame with at least a single column. Rows of colData correspond to columns of countData"
"design : a formula which expresses how the counts for each gene depend on the variables in colData"
I do not understand why you want to do a differential analysis between 9 samples that were treated equally, what is your biological question ?