Question

How to get the counts matrix correctly for DESeq2 from the counts table resulting from nextflow rnaseq pipeline

0

Entering edit mode

15 months ago

Josh ▴ 20

Hi, I have used the rnaseq nextflow pipeline to generate count tables from 9 samples of this GEO dataset:

https://www.ncbi.nlm.nih.gov/geo/query/acc.cgi?acc=GSE134127,

There are 18 samples, 9 exposed to a chemical compound and 9 unexposed, I have chosen the 9 unexposed samples as they are of interest to me for a project; the samples appear in SRA like this:

https://www.ncbi.nlm.nih.gov/Traces/study/?acc=PRJNA554005&f=source_name_sam_ss%3An%3Amock-treated%3Ac&o=acc_s%3Aa

My goal is to do a differential expression analysis between these samples and obtain a volcano plot and a heatmap with DESeq2.

I understand that DESeq2 in order to generate a volcano plot and heatmap, needs to generate a matrix counts that can be generated from a previous matrix or a tximport object containing the rounded counts, for example, from a matrix:

dds <- DESeqDataSetFromMatrix(countData = counts_as_matrix,
                              colData = samplesheet_metadata_SRA,
                              design= ~ condition) # This is where i do not know what to assign

I have managed to generate a matrix object with the counts of my samples, but I have doubts about the colData and design arguments, because I don't know if they are right.

For example this is how a part of my file that should go in colData looks like, which is basically the SRA metadata file of the 9 samples I have mentioned, I show 3 of the 9 samples, which all have the same characteristics for source_name, tissue, etc.

Experiment	Sample	source_name	Tissue	disease	Cell_type
SRX6430481	GSM3937647	mock-treated	breast cancer	adenocarcinoma	epithelial
SRX6430482	GSM3937648	mock-treated	breast cancer	adenocarcinoma	epithelial
SRX6430483	GSM3937649	mock-treated	breast cancer	adenocarcinoma	epithelial

My biggest doubt is regarding the design argument in the function DESeqDataSetFromMatrix, what should I put there? as you can see in the metadata table, all the samples are mock-treated.

Thank you for your help and time

salmon rnaseq DESeq2 • 706 views

ADD COMMENT • link updated 15 months ago by Basti ★ 2.0k • written 15 months ago by Josh ▴ 20

0

Entering edit mode

You need to set the format required : https://www.rdocumentation.org/packages/DESeq2/versions/1.12.3/topics/DESeqDataSet-class

"colData : for matrix input: a DataFrame or data.frame with at least a single column. Rows of colData correspond to columns of countData"

"design : a formula which expresses how the counts for each gene depend on the variables in colData"

I do not understand why you want to do a differential analysis between 9 samples that were treated equally, what is your biological question ?

ADD REPLY • link 15 months ago by Basti ★ 2.0k