How to get the counts matrix correctly for DESeq2 from the counts table resulting from nextflow rnaseq pipeline
0
0
Entering edit mode
15 months ago
Josh ▴ 20

Hi, I have used the rnaseq nextflow pipeline to generate count tables from 9 samples of this GEO dataset:

https://www.ncbi.nlm.nih.gov/geo/query/acc.cgi?acc=GSE134127,

There are 18 samples, 9 exposed to a chemical compound and 9 unexposed, I have chosen the 9 unexposed samples as they are of interest to me for a project; the samples appear in SRA like this:

https://www.ncbi.nlm.nih.gov/Traces/study/?acc=PRJNA554005&f=source_name_sam_ss%3An%3Amock-treated%3Ac&o=acc_s%3Aa

My goal is to do a differential expression analysis between these samples and obtain a volcano plot and a heatmap with DESeq2.

I understand that DESeq2 in order to generate a volcano plot and heatmap, needs to generate a matrix counts that can be generated from a previous matrix or a tximport object containing the rounded counts, for example, from a matrix:

dds <- DESeqDataSetFromMatrix(countData = counts_as_matrix,
                              colData = samplesheet_metadata_SRA,
                              design= ~ condition) # This is where i do not know what to assign

I have managed to generate a matrix object with the counts of my samples, but I have doubts about the colData and design arguments, because I don't know if they are right.

For example this is how a part of my file that should go in colData looks like, which is basically the SRA metadata file of the 9 samples I have mentioned, I show 3 of the 9 samples, which all have the same characteristics for source_name, tissue, etc.

Experiment Sample source_name Tissue disease Cell_type
SRX6430481 GSM3937647 mock-treated breast cancer adenocarcinoma epithelial
SRX6430482 GSM3937648 mock-treated breast cancer adenocarcinoma epithelial
SRX6430483 GSM3937649 mock-treated breast cancer adenocarcinoma epithelial

My biggest doubt is regarding the design argument in the function DESeqDataSetFromMatrix, what should I put there? as you can see in the metadata table, all the samples are mock-treated.

Thank you for your help and time

salmon rnaseq DESeq2 • 706 views
ADD COMMENT
0
Entering edit mode

You need to set the format required : https://www.rdocumentation.org/packages/DESeq2/versions/1.12.3/topics/DESeqDataSet-class

"colData : for matrix input: a DataFrame or data.frame with at least a single column. Rows of colData correspond to columns of countData"

"design : a formula which expresses how the counts for each gene depend on the variables in colData"

I do not understand why you want to do a differential analysis between 9 samples that were treated equally, what is your biological question ?

ADD REPLY

Login before adding your answer.

Traffic: 2173 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6