(I'm new at this, so keep that in mind and please correct me if I'm wrong.)
I would recommend using the tximeta package to import your data and analyzing with DESeq2 to start.
If the concentration is encoded in your metadata table as integer, DESeq2 writes in the console:
the design formula contains one or more numeric variables with integer values,
specifying a model with increasing fold change for higher values.
(With numeric values, it does not output this message.)
I believe this is what you're asking for?
Here's how I would solve that:
You'll need to make a metadata table for your data to use with tximeta (including a column of file paths to the quant.sf files:
library(tidyverse)
library(tximeta)
library(DESeq2)
meta <- tribble(
~sample, ~drug, ~concentration, ~files,
"sampA", "A", 1, "filepath",
"sampB", "A", 2, "filepath",
"sampC", "A", 3, "filepath",
"sampD", "A", 4, "filepath",
"sampE", "A", 5, "filepath",
"sampF", "B", 2, "filepath",
"sampG", "B", 3, "filepath",
"sampH", "B", 4, "filepath",
"sampI", "B", 5, "filepath",
"sampJ", "B", 6, "filepath")
Using a combined dataset:
se_combined <- tximeta(meta)
gse_combined <- summarizeToGene(se_combined)
dds_combined <- DESeqDataSet(gse_combined, design = ~ drug + concentration + drug:concentration)
dds_combined <- DESeq(dds_combined)
results(dds_combined, name = "concentration")
results(dds_combined, contrast = list(c("concentration", "drugB.concentration")))
From the DESeq2 vignette:
The key point to remember about designs with interaction terms is
that, unlike for a design ~genotype + condition, where the condition
effect represents the overall effect controlling for differences due
to genotype, by adding genotype:condition, the main condition effect
only represents the effect of condition for the reference level of
genotype (I, or whichever level was defined by the user as the
reference level). The interaction terms genotypeII.conditionB and
genotypeIII.conditionB give the difference between the condition
effect for a given genotype and the condition effect for the reference
genotype.
and
The condition effect for genotype II (or III) is obtained by adding the main condition effect and the interaction term for that genotype.
Unfortunately, in the ?results
help, it says to use name
for continuous variables, which is fine for the effects of drugA, but the effects of drugB used contrast
. You might need to break up your dataset by drug to get around this.
?results
:
name
the name of the individual effect (coefficient) for building a results table. Use this argument rather than contrast for continuous variables, individual effects or for individual interaction terms. The value provided to name must be an element of resultsNames(object).
To get around using contrast
, subset your data and change the design formula:
dds_drug_X <- dds_combined[, dds_combined$drug == "X"]
design(dds_drug_X) <- ~ concentration
dds_drug_X <- DESeq(dds_drug_X)
results(dds_drug_X, name = "concentration")
I would further explore the vignettes for all the necessary steps. The authors of these tools comment here regularly, so maybe they'll chime in with more info.
TPM values for each sample would be a vector of size ~25K (number of genes). How do you plan on regressing that with a vector of 15 values (per sample again)?
Thank you! It is what I need but I am not sure what I said is the best way to get a linear relationship for TPM values and chemical compounds.